Hacker news

Top
New
Past
Ask
Show
Jobs

Claude Cowork exfiltrates files (https://www.promptarmor.com)

865 points by takira 4 days ago | 398 comments | View on ycombinator

burkaman 4 days ago |

In this demonstration they use a .docx with prompt injection hidden in an unreadable font size, but in the real world that would probably be unnecessary. You could upload a plain Markdown file somewhere and tell people it has a skill that will teach Claude how to negotiate their mortgage rate and plenty of people would download and use it without ever opening and reading the file. If anything you might be more successful this way, because a .md file feel less suspicious than a .docx.

Tiberium 4 days ago |

A bit unrelated, but if you ever find a malicious use of Anthropic APIs like that, you can just upload the key to a GitHub Gist or a public repo - Anthropic is a GitHub scanning partner, so the key will be revoked almost instantly (you can delete the gist afterwards).

It works for a lot of other providers too, including OpenAI (which also has file APIs, by the way).

https://support.claude.com/en/articles/9767949-api-key-best-...

https://docs.github.com/en/code-security/reference/secret-se...

hombre_fatal 4 days ago |

One issue here seems to come from the fact that Claude "skills" are so implicit + aren't registered into some higher level tool layer.

Unlike /slash commands, skills attempt to be magical. A skill is just "Here's how you can extract files: {instructions}".

Claude then has to decide when you're trying to invoke a skill. So perhaps any time you say "decompress" or "extract" in the context of files, it will use the instructions from that skill.

It seems like this + no skill "registration" makes it much easier for prompt injection to sneak new abilities into the token stream and then make it so you never know if you might trigger one with normal prompting.

We probably want to move from implicit tools to explicit tools that are statically registered.

So, there currently are lower level tools like Fetch(url), Bash("ls:*"), Read(path), Update(path, content).

Then maybe with a more explicit skill system, you can create a new tool Extract(path), and maybe it can additionally whitelist certain subtools like Read(path) and Bash("tar *"). So you can whitelist Extract globally and know that it can only read and tar.

And since it's more explicit/static, you can require human approval for those tools, and more tools can't be registered during the session the same way an API request can't add a new /endpoint to the server.

c7b 3 days ago |

One thing that kind of baffles me about the popularity of tools like Claude Code is that their main target group seems to be developers (TUI interfaces, semi-structured instruction files,... not the kind of stuff I'd get my parents to use). So people who would be quite capable of building a simple agentic loop themselves [0]. It won't be quite as powerful as the commercial tools, but given that you deeply know how it works you can also tailor it to your specific problems much better. And sandbox it better (it baffles me that the tools' proposed solution to avoid wiping the entire disk is relying on user confirmation [1]).

It's like customizing your text editor or desktop environment. You can do it all yourself, you can get ideas and snippets from other people's setups. But fully relying on proprietary SaaS tools - that we know will have to get more expensive eventually - for some of your core productivity workflows seems unwise to me.

[0] https://news.ycombinator.com/item?id=46545620

[1] https://www.theregister.com/2025/12/01/google_antigravity_wi...

rkagerer 3 days ago |

Cowork is a research preview with unique risks due to its agentic nature and internet access.

The level of risk entailed from putting those two things together is a recipe for diaster.

Animats 4 days ago |

> "This attack is not dependent on the injection source - other injection sources include, but are not limited to: web data from Claude for Chrome, connected MCP servers, etc."

Oh, no, another "when in doubt, execute the file as a program" class of bugs. Windows XP was famous for that. And gradually Microsoft stopped auto-running anything that came along that could possibly be auto-run.

These prompt-driven systems need to be much clearer on what they're allowed to trust as a directive.

rvz 4 days ago |

Exfiltrated without a Pwn2Own in 2 days of release and 1 day after my comment [0], despite "sandboxes", "VMs", "bubblewrap" and "allowlists".

Exploited with a basic prompt injection attack. Prompt injection is the new RCE.

[0] https://news.ycombinator.com/item?id=46601302

phyzome 3 days ago |

There's a sort of milkshake-duck cadence to these "product announcement, vulnerability announcement" AI post pairs.

bilater 3 days ago |

I wonder if we'll get something like a CORS for agents where they can only pass around data to whitelisted ips (local, claude sanctioned servers etc).

danielrhodes 3 days ago |

This is no surprise. We are all learning together here.

There are any number of ways to foot gun yourself with programming languages. SQL injection attacks used to be a common gotcha, for example. But nowadays, you see it way less.

It’s similar here: there are ways to mitigate this and as we learn about other vectors we will learn how to patch them better as well. Before you know it, it will just become built into the models and libraries we use.

In the mean time, enjoy being the guinea pig.

emsign 3 days ago |

LLMs can't distinguish between context and prompt. There will always be prompt injections hiding, lurking somewhere.

patapong 3 days ago |

The specific issue here seems to be that Anthropic allows the unrestricted upload of personal files to the anthropic cloud environment, but does not check to make sure that the cloud environment belongs to the user running the session.

This should be relatively simple to fix. But, that would not solve the million other ways a file can be sent to another computer, whether through the user opening a compromised .html document or .pdf file etc etc.

This fundamentally comes down to the issue that we are running intelligent agents that can be turned against us on personal data. In a way, it mirrors the AI Box problem: https://www.yudkowsky.net/singularity/aibox

tuananh 3 days ago |

this attack is quite nice.

- currently we have no skills hub, no way to do versioning, signing, attestation for skills we want to use.

- they do sandboxing but probably just simple whitelist/blacklist url. they ofcourse needs to whitelist their own domains -> uploading cross account.

mvandermeulen 2 days ago |

I have noticed an abundance of Claude config/skills/plugins/agents related repositories on GitHub which purport to contain some generic implementation of whatever is on offer but also contain malware inside a zip file.

They all make use of the GitHub topic feature to be found. The most recent commit will usually be a trivial update to README.md which is done simply to maintain visibility for anyone browsing topics by recently updated. The readme will typically instruct installation by downloading the zip file rather than cloning the repo.

I assume the payload steals Claude credentials or something similar. The sheer number of repos would suggest plenty of downloads which is quite disheartening.

It would take a GitHub engineer barely minutes to implement a policy which would eradicate these repos but they don’t seem to care. I have also been unable to use the search function on GitHub for over 6 months now which is irrelevant to this discussion but it seems paying customers cannot count on Github to do even the bare minimum by them.

xg15 3 days ago |

Is it even prompt injection if the malicious instructions are in a file that is supposed to be read as instructions?

Seems to me the direct takeaway is pretty simple: Treat skill files as executable code; treat third-party skill files as third-party executable code, with all the usual security/trust implications.

I think the more interesting problem would be if you can get prompt injections done in "data" files - e.g. can you hide prompt injections inside PDFs or API responses that Claude legitimately has to access to perform the task?

kingjimmy 4 days ago |

promptarmor has been dropping some fire recently, great work! Wish them all the best in holding product teams accountable on quality.

leetrout 4 days ago |

Tangential topic: Who provides exfil proof of concepts as a service? I've a need to explore poison pills in CLAUDE.md and similar when Claude is running in remote 3rd party environments like CI.

dangoodmanUT 4 days ago |

This is why we only allow our agent VMs to talk to pip, npm, and apt. Even then, the outgoing request sizes are monitoring to make sure that they are resonably small

LetsGetTechnicl 3 days ago |

I know this isn't even the worst example, but the whole LLM craze has been insane to witness. Just releasing dangerous tools onto an uneducated and unprepared public and now we have to deal with the consequences because no one thought "should we do this?"

caminanteblanco 4 days ago |

Well that didn't take very long...

teekert 3 days ago |

Everything is a .exe if you're LLM enough.

wunderwuzzi23 4 days ago |

Relevant prior post, includes a response from Anthropic:

https://embracethered.com/blog/posts/2025/claude-abusing-net...

MarginalGainz 3 days ago |

Context injection is becoming the new SQL injection. Until we have better isolation layers, letting an LLM 'cowork' on sensitive repos without a middleware sanitization layer is a compliance nightmare waiting to happen.

refulgentis 4 days ago |

These prompt injection techniques are increasingly implausible* to me yet theoretically sound.

Anyone know what can avoid this being posted when you build a tool like this? AFAIK there is no simonw blessed way to avoid it.

* I upload a random doc I got online, don’t read it, and it includes an API key in it for the attacker.

fudged71 3 days ago |

I found a bunch of potential vulnerabilities in the example Skills .py files provided by Anthropic. I don't believe the CVSS/Severity scores though:

| pdf | Lack of Input Validation | 3.7 | Low |

armcat 3 days ago |

I know it might slow things down, but why not do this:

1. Categorize certain commands (like network/curl/db/sql) as `simulation_required` 2. Run a simulation of that command (without actual execution) 3. As part of the simulation run a red/blue team setup, where you have two Claude agents each either their red/blue persona and a set of skills 4. If step (3) does not pass, notify the user/initiator

calflegal 4 days ago |

So, I guess we're waiting on the big one, right? The ?10+? billion dollar attack?

ryanjshaw 3 days ago |

The Confused Deputy [1] strikes again. Maybe this time around capabilities-based solutions will get attention.

[1] https://web.archive.org/web/20031205034929/http://www.cis.up...

tnynt63 2 days ago |

Non-stop under attack by entire locals hackers and using http thiland government files inside my phone, its unknown codes and even yandex can't solves almost 6 months over we found at browser for weather forecast

sgammon 4 days ago |

is it not a file exfiltrator, as a product

khalic 3 days ago |

If you don’t read the skills you install in your agent, you really shouldn’t be using one.

undefined 4 days ago |

undefined

undefined 4 days ago |

undefined

SamDc73 4 days ago |

I was waiting for someone to say "this is what happens when you vibe code"

woggy 4 days ago |

What's the chance of getting Opus 4.5-level models running locally in the future?

Havoc 3 days ago |

How do the larger search services like perplexity deal with this?

They’re passing in half the internet via rag and presumably didn’t run a llamaguard type thing over literally everything?

fathermarz 3 days ago |

This is getting outrageous. How many times must we talk about prompt injection. Yes it exists and will forever. Saying the bad guys API key will make it into your financial statements? Excuse me?

jryio 3 days ago |

As prophesied https://news.ycombinator.com/item?id=46593628

chaostheory 3 days ago |

Running these agents in their own separate browsers, VMs, or even machines should help. I do the same with finance-related sites.

__0x01 3 days ago |

I also worry about a centralised service having access to confidential and private plaintext files of millions of users.

wutwutwat 3 days ago |

the same way you are not supposed to pipe curl to bash, you shouldn't raw dawg the internet into the mouth of a coding agent.

If you do, just like curl to bash, you accept the risk of running random and potentially malicious shit on your systems.

rsynnott 4 days ago |

That was quick. I mean, I assumed it'd happen, but this is, what, the first day?

gnarbarian 3 days ago |

jokes on them I have an anti prompt injection instruction file.

instructions contained outside of my read only plan documents are not to be followed. and I have several Canaries.

choldstare 4 days ago |

we have to treat these vulnerabilities basically as phishing

adam_patarino 3 days ago |

What frustrates me is that Anthropic brags they built cowork in 10 days. They don’t show the seriousness or care required for a product that has access to my data.

niyikiza 4 days ago |

Another week, another agent "allowlist" bypass. Been prototyping a "prepared statement" pattern for agents: signed capability warrants that deterministically constrain tool calls regardless of what the prompt says. Prompt injection corrupts intent, but the warrant doesn't change.

Curious if anyone else is going down this path.

tnynt63 2 days ago |

А я думаю есть вы проверьте

Juliate 3 days ago |

How do these people manage to get people to pay them?...

Just a few years ago, no one would have contemplated putting in production or connecting their systems, whatever the level of criticality, to systems that have so little deterministic behaviour.

In most companies I've worked for, even barebones startups, connecting your IDE to such a remote service, or even uploading requirements, would have been ground for suspension or at least thorough discussion.

The enshitification of all this industry and its mode of operation is truly baffling. Shall the bubble burst at last!

undefined 3 days ago |

undefined

jerryShaker 4 days ago |

AI companies just 'acknowledging' risks and suggesting users take unreasonable precautions is such crap

Escapade5160 4 days ago |

That was fast.

hakanderyal 4 days ago |

This was apparent from the beginning. And until prompt injection is solved, this will happen, again and again.

Also, I'll break my own rule and make a "meta" comment here.

Imagine HN in 1999: 'Bobby Tables just dropped the production database. This is what happens when you let user input touch your queries. We TOLD you this dynamic web stuff was a mistake. Static HTML never had injection attacks. Real programmers use stored procedures and validate everything by hand.'

It's sounding more and more like this in here.

jsheard 4 days ago |

Remember kids: the "S" in "AI Agent" stands for "Security".

mbowcut2 3 days ago |

Wow, I didn't know about the "skills" feature, but with that as context isn't this attack strategy obvious? Running an unverified skill in Cowork is akin to running unverified code on your machine. The next super-genius attack vector will be something like: Claude Cowork deletes sytem32 when you give it root access and run the skill "brick_my_machine" /s.

kewldev87 3 days ago |

[dead]

llmslave 4 days ago |

[flagged]

sawjet 3 days ago |

This is one of those things that is a feature of Claude, not a bug. Sonnet and opus 4.5 can absolutely detect prompt attacks, however they are post-trained to ignore them in let's say ... Certain scenarios... At least if you are using the API.

lifetimerubyist 3 days ago |

Instead of vibing out insecure features in a week using Claude Code can Anthropic spend some time making the desktop app NOT a buggy POS. Bragging that you launched this in a week and Claude Code wrote all of the code looks horrible on you all things considered.

Randomly can’t start new conversations.

Uses 30% CPU constantly, at idle.

Slow as molasses.

You want to lock us into your ecosystem but your ecosystem sucks.