614 points by pdyc 3 days ago | 762 comments | View on ycombinator
ValentineC 3 days ago |
f311a 3 days ago |
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
thundergolfer 2 days ago |
Probably better to use the fully-loaded cost of the engineer, which is much higher than their compensation package. The fully-loaded cost is the total cost paid for the labor power of the engineer, and it includes big ticket items such as office space, food, equipment, insurance, payroll tax, fringe benefits, recruiting costs.
If the median compensation package is $330k/year then the median fully loaded cost is probably around $450-500k.
tuesdaynight 3 days ago |
CharlieDigital 3 days ago |
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
jkwang 3 days ago |
siliconc0w 3 days ago |
It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
blobbers 2 days ago |
I definitely have written a goal file, and then just ran claude in a loop over the goal in order to 'token max'... why not? I'm doing research and have some clear KPIs where research into all kinds of techniques / tuning can improve the results. I can spend my budget on a "experiment with blah blah blah to improve blah blah" or give it a list of things to try that I know will take awhile.
Its no problem hitting hundreds of $ of API spend while sitting at a computer with 3 monitors have 6 windows of useful claude code interactive sessions, while working on 2 or 3 projects and using worktrees, and it's a little weird when you hit your limit by 2 o'clock and have to wait for token budgets to reset; god forbid, I manually edit code... which I did do for the first time in months.
You can also start to generate a lot of token spend if you do something like "hey make me a stylized slide deck using internal skill / agent XYZ based on commits A through C", which as an engineer, makes presentations building much less painful.
This uber limit is not high compared to the big SV companies.
marcosdumay 2 days ago |
thesumofall 2 days ago |
linuxhansl 2 days ago |
Just looked at spent for the past 30 day, didn't even come to $600. 95% of my tokens are from cache. If I were to reach even $1500 I have to let claude run unsupervised over night (and with the amount of mistakes it still makes and guidance it needs, I do not believe we are there yet.)
suncemoje 2 days ago |
nikolay 1 day ago |
geodel 3 days ago |
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
john01dav 3 days ago |
ChrisArchitect 3 days ago |
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
c7b 2 days ago |
rasbmn 3 days ago |
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
sremani 3 days ago |
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
cmiles8 3 days ago |
synergy20 1 day ago |
dzonga 2 days ago |
when looking at costs - numbers make sense. however decisions as an org/company/solo founder - costs help you set prices, but to reach profitability you want to model around ROI.
now the question is what's the ROI for a $36K/investment per engineer or $90M for the total org ?
I bet the ROI is negative.
PessimalDecimal 3 days ago |
watershawl 2 days ago |
Wait a minute. We didn’t save money by adding AI. We just added an expense.
Now we have to pay for employees AND AI.
CSMastermind 2 days ago |
I'd guess there should be a few people Uber is bascially allocating unlimited AI spending to and a large swath they're giving basically nothing.
szatkus 3 days ago |
pmontra 3 days ago |
newobj 3 days ago |
colonelspace 2 days ago |
nalekberov 2 days ago |
I was recently talking to an HR person from a European company, and she goes: 'We are forcing our developers to use AI coding agents, but they are still kind of hesitant.' This person had never written a single line of code, nor did she know what software engineering is. For these people, using AI coding agents = faster delivery without breaking anything.
epsteingpt 3 days ago |
insane_dreamer 2 days ago |
Maybe it's just me, but I still find that I really have to "shepherd" the AI and work with it to get the results I want. And I read every line of code added and challenge the model's logic. So that limits my token burning. Maybe these people are just "vibe-coding" without really checking the results?
etothet 3 days ago |
LurkandComment 3 days ago |
Avijit_Thawani about 18 hours ago |
andix 2 days ago |
(Cost of an employee is much higher than their salary, it includes things like office space, supporting structures like HR/accounting, insurance, hardware/software, and much more)
galaxyLogic 3 days ago |
827a 2 days ago |
deviation 2 days ago |
sylwk 2 days ago |
My $100 subscription is not cheap. At the same time our product burns orders of magnitude more tokens.
schnitzelstoat 2 days ago |
If you use stuff like opusplan and /advisor so you use Sonnet for most of the work and only Opus for the really complex stuff then it's quite easy to keep costs low without affecting performance.
throw1234567891 about 24 hours ago |
jwpapi 3 days ago |
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
ilia-a 3 days ago |
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
zkmon 2 days ago |
undefined 2 days ago |
packspro 2 days ago |
cyanydeez 3 days ago |
That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.
5701652400 3 days ago |
china will be major token exporter soon. mark my words.
sameersri2004 2 days ago |
meszmate 2 days ago |
But yeah, for a company at Uber’s scale, I can see why they would want real engineering discipline around it.
throw0606 1 day ago |
noncoml 2 days ago |
Their wet dream was never automation. It was zero marginal cost labor. And that dream is starting to rot.
easygenes 2 days ago |
era-epoch 2 days ago |
Oh that's actually really economical! I wonder if they're doing a lot on locally running models or managing a shared context or knowledge-base in some clever way, maybe just encouraging employees to be efficient and mindful.
...
> each employee
...
> per AI coding tool
...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI
What on this godforsaken earth are all you rich idiots doing???
Galanwe 2 days ago |
To the mooooon!
lanthissa 1 day ago |
people self limit when there are caps. if you give people unlimited they wont even use sonnet easy things.
transitorykris 2 days ago |
hrpnk 3 days ago |
ewangzzz 2 days ago |
lrvick about 24 hours ago |
Stop giving Anthropic money and figure out how to take the same money to buy some GPUs, and physically insert them into workstations. It is not that hard, I promise.
walthamstow 2 days ago |
LeicaLatte 2 days ago |
cloudking 3 days ago |
ipunchghosts 2 days ago |
cadamsdotcom 2 days ago |
Naively you’d expect to always keep paying more - but growth in token usage is what changes the equation. Amortizing debt over an exponentially growing amount of spend across a growing customer base (not per customer) lets the debt be paid off & costs covered even as each individual’s spend stays steady or even goes down - but it only works if there’s growth beyond some threshold that makes the whole thing hang together. No one on the outside knows how much growth that is, and everyone chases maximum growth.
Jevons Paradox ends up being your friend as well as the friend of the inference providers as well as the friend of the inference financiers.
If it’s a strong enough effect, it has potential to cancel out all the circular financing too, and let everyone ride out the bursting of the bubble.
KnuthIsGod 2 days ago |
jedisct1 3 days ago |
edg5000 2 days ago |
gck1 2 days ago |
gck1 2 days ago |
I also misconfigured something in my agent's configuration and a simple web tool request (maybe 4 turns) through OR went to GPT-5.5 accidentally and that cost me ~$0.4.
I have no idea how any business can afford API rates without having a mindset of casually setting money on fire.
nphardon 2 days ago |
morpheos137 2 days ago |
agentbc9000 2 days ago |
DarenWatson 1 day ago |
shivyadavus 1 day ago |
dmaso191 3 days ago |
maxothex 1 day ago |
benbct about 19 hours ago |
aitoukhrib 1 day ago |
fourleafai 1 day ago |
zftnb666 2 days ago |
Ile09 1 day ago |
eugeneonai 2 days ago |
Ozzie-D 3 days ago |
willyv3 2 days ago |
themuskgpt2025 1 day ago |
throwaway613746 3 days ago |
ashahin 3 days ago |
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.