519 points by binyu 1 day ago | 140 comments | View on ycombinator
tptacek 1 day ago |
simonw 1 day ago |
https://github.com/anthropics/defending-code-reference-harne... says:
> As a rough guideline, expect ~10K uncached input tokens/min and ~2K output tokens/min per agent. You can scale parallelism up to your account's ITPM limit (roughly 10 agents per 100K ITPM).
My guess would be hundreds of dollars with Opus and thousands of dollars with Mythos.
lanyard-textile 1 day ago |
Hm :)
yalogin about 11 hours ago |
HarHarVeryFunny about 13 hours ago |
This makes for a somewhat amusing set of product offerings given that according to Dario 90% of all software is being AI generated.
Maybe next they can sell something to find the bugs in the security scanner ?
baby 1 day ago |
Every week I see bugs (as an auditor) that our own harness (https://zkao.io/) can't find, and we have to figure out pretty interesting techniques in order to make the tool find them. Mind you I'm talking mostly about cryptographic vulnerabilities, not just webapp bugs. So IMO it's going to make a lot of sense for companies to have both their own harness (as tptacek is talking about) and pay for services that focus on making a good harness from experience (and audit firms are going to be the best at doing this, as they see a lot of bugs and can spend time "teaching" their harness about these bugs)
On the other hand, you have to find equally as good techniques to triage, because otherwise you just have some machinery that I call "vibe auditing" that just produces enough false positives to tire all the developers (who are already overwhelmed with crappy AI submissions in bugbounties and other AI tool that review all of their PRs).
At the end of the day, when your harness doesn't return any bug, you're left wondering "does it mean there's no bugs?" We're basically back in this reputation game, where you want to use the best tool, or the best team (that knows what the best tools are), and need to figure out which one is.
richardbarosky 1 day ago |
Something that stands out is that for the strongest use cases, AI companies will prefer to sell the technique as a service rather than its raw output. For use cases where the output is less valuable, tokens are sold. If AI tokens were so magical in creating new value in developing software applications generally, they wouldn't be selling tokens directly. They'd hoard the tokens are use them to dominate SaaS software in any industry they want.
The same way as someone selling an expensive course in the stock market is signaling that they have more to gain by selling the course rather than taking their knowledge and making money in the stock market directly.
dclavijo 1 day ago |
majicDave 1 day ago |
bobkb 1 day ago |
I have working on and using a similar tool for a while now :
https://github.com/bobinson/vulture
I have been struggling with false positives and using Claude + MCP as a poor man’s audit tool. As of last few days found better result with nvidia hosted models.
newaccount12344 1 day ago |
NotPractical about 5 hours ago |
cpard 1 day ago |
This is the equivalent of Claude Design but for security.
Different harness, different packaging and obviously different distribution because the persona is different.
It’s funny because from all the posts I’ve read from companies reporting on Mythos, everyone is building their own harness for it.
Cisco even published a specification for one.
But Anthropic is the one who has figured out how to package and distribute this. Great GTM!
sciencejerk about 23 hours ago |
trilogic 1 day ago |
Be aware: the .py/s will not pass the antivirus but basically they do the job.
madduci about 23 hours ago |
Nice
undefined 1 day ago |
bigmattystyles 1 day ago |
ElijahLynn 1 day ago |
That repo is Anthropics.
This post title should clarify that it is not Anthropic (no "s").
leetrout about 11 hours ago |
Like others I suspect this is exactly what they are going to paywall with product features going forward.
LazyR3nR3n about 19 hours ago |
sylware about 13 hours ago |
euroderf 1 day ago |
extr 1 day ago |
eranation 1 day ago |
tl;dr - not that it's surprising, but it's not cheap, especially if you want to do this continuously.
zoobab 1 day ago |
bartoszcki 1 day ago |
Are they making 8x more features or the same amount just with more code?
crooked-v 1 day ago |
wslh 1 day ago |
undefined 1 day ago |
volume_tech about 14 hours ago |
Xotic007 about 12 hours ago |
sspoisk about 16 hours ago |
xuzhenpeng 1 day ago |
afford-ai 1 day ago |
eddysir about 15 hours ago |
Maya_Andersson about 10 hours ago |
EvanXue 1 day ago |
notenkidev about 24 hours ago |
aos_architect about 15 hours ago |
edgardurand 1 day ago |
undefined about 15 hours ago |
continueops_com about 17 hours ago |
xinchen03 about 23 hours ago |
vladsiu about 23 hours ago |
jungfty 1 day ago |
dclavijo 1 day ago |
zoobab 1 day ago |
It was a different situation 2 years ago, when there was significant cost to building your own harness (but then: you probably weren't doing AI vuln research 2 years ago). Today, I think your best bet is to look at something like this for ideas, and then just ask for your own, to fit your own work style, with your own interface, your own notion of target and effort specification, and your own alerting.