248 points by meetpateltech 4 days ago | 146 comments | View on ycombinator
Tiberium 4 days ago |
BoumTAC 4 days ago |
The frontier models have become so good that it's getting almost impossible to notice meaningful differences between them.
Meanwhile, when a smaller / less powerful model releases a new version, the jump in quality is often massive, to the point where we can now use them 100% of the time in many cases.
And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.
pscanf 4 days ago |
They're incredibly slow (via official API or openrouter), but most of all they seem not to understand the instructions that I give them. I'm sure I'm _holding them wrong_, in the sense that I'm not tailoring my prompt for them, but most other models don't have problem with the exact same prompt.
Does anybody else have a similar experience?
simonw 4 days ago |
HugoDias 4 days ago |
GPT 5 mini: Input $0.25 / Output $2.00
GPT 5 nano: Input: $0.05 / Output $0.40
GPT 5.4 mini: Input $0.75 / Output $4.50
GPT 5.4 nano: Input $0.20 / Output $1.25
mikkelam 4 days ago |
Most "Model X > Model Y" takes on HN these days (and everywhere) seem based on an hour of unscientific manual prompting. Are we actually running rigorous, version-controlled evals, or just making architectural decisions based on whether a model nailed a regex on the first try this morning?
ibrahim_h 4 days ago |
Also context bleed into nano subagents in multi-model pipelines — I've seen orchestrators that just forward the entire message history by default (or something like messages[-N:] without any real budgeting), so your "cheap" extraction step suddenly runs with 30-50K tokens of irrelevant context. And then what's even the point, you've eaten the latency/cost win and added truncation risk on top.
Has anyone actually measured where that cutoff is in practice? At what context size nano stops being meaningfully cheaper/faster in real pipelines, not benchmarks.
powera 4 days ago |
For many "simple" LLM tasks, GPT-5-mini was sufficient 99% of the time. Hopefully these models will do even more and closer to 100% accuracy.
The prices are up 2-4x compared to GPT-5-mini and nano. Were those models just loss leaders, or are these substantially larger/better?
pugchat 1 day ago |
Frontier models are expensive and used by developers who read terms of service. Mini models are cheap, embedded in apps, used by everyone—and users rarely understand what's being collected.
OpenAI's revenue model increasingly depends on high-volume, low-cost inference. That volume requires scale. Scale creates incentives to monetize the data flowing through the system.
The model quality race is mostly won. The next frontier is who accumulates the most behavioral data from everyday users. Mini releases are how you get there.
[Disclosure: I work with pugchat.ai, a privacy-focused AI aggregator—relevant bias]
cbg0 4 days ago |
technocrat8080 4 days ago |
tintor 4 days ago |
Did GPT write them?
fastpdfai 4 days ago |
morpheos137 4 days ago |
XCSme 4 days ago |
5.4 mini seems to struggle with consistency, and even with temperature 0 sometimes gives the correct response, sometimes a wrong one...
[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...
nicpottier 4 days ago |
GPT 5.4 mini is the first alternative that is both affordable and decent. Pretty impressed. On a $20 codex plan I think I'm pretty set and the value is there for me.
ryao 4 days ago |
beklein 4 days ago |
michaelgdwn 4 days ago |
dmix 4 days ago |
dack 4 days ago |
6thbit 4 days ago |
Is there any harness with an easy way to pick a model for a subagent based on the required context size the subagent may need?
Rapzid 4 days ago |
AbstractH24 3 days ago |
bananamogul 4 days ago |
jbellis 4 days ago |
Preregistering my predictions:
Mini: better than Haiku but not as good as Flash 3, especially at reasoning=none.
Nano: worse than Flash 3 Lite. Probably better than Qwen 3.5 27b.
beernet 4 days ago |
simianwords 4 days ago |
jerrygoyal 4 days ago |
xyproto 3 days ago |
machinecontrol 4 days ago |
kseniamorph 4 days ago |
yomismoaqui 4 days ago |
casey2 4 days ago |
varispeed 4 days ago |
derefr 4 days ago |
I assume that OpenAI continue to use words like "mini" and "nano" in the names of these model variants, to imply that they reserve the smallest possible resource-units of their inference clusters... but, given OpenAI's scale, that may well be "one B200" at this point, rather than anything consumers (or even most companies) could afford.
I ask because I'm curious whether the economics of these models' use-cases and call frequency work out (both from the customer perspective, and from OpenAI's perspective) in favor of OpenAI actually hosting inference on these models themselves, vs. it being better if customers (esp. enterprise customers) could instead license these models to run on-prem as black-box software appliances.
But of course, that question is only interesting / only has a non-trivial answer, if these models are small enough that it's actually possible to run them on hardware that costs less to acquire than a year's querying quota for the hosted version.
reconnecting 4 days ago |
Seriously?
miltonlost 4 days ago |
system2 4 days ago |
- Older GPT-5 Mini is about 55-60 tokens/s on API normally, 115-120 t/s when used with service_tier="priority" (2x cost).
- GPT-5.4 Mini averages about 180-190 t/s on API. Priority does nothing for it currently.
- GPT-5.4 Nano is at about 200 t/s.
To put this into perspective, Gemini 3 Flash is about 130 t/s on Gemini API and about 120 t/s on Vertex.
This is raw tokens/s for all models, it doesn't exclude reasoning tokens, but I ran models with none/minimal effort where supported.
And quick price comparisons:
- Claude: Opus 4.6 is $5/$25, Sonnet 4.6 is $3/$15, Haiku 4.5 is $1/$5
- GPT: 5.4 is $2.5/$15 ($5/$22.5 for >200K context), 5.4 Mini is $0.75/$4.5, 5.4 Nano is $0.2/$1.25
- Gemini: 3.1 Pro is $2/$12 ($3/$18 for >200K context), 3 Flash is $0.5/$3, 3.1 Flash Lite is $0.25/$1.5