Hacker news

Top
New
Past
Ask
Show
Jobs

GPT‑5.4 Mini and Nano (https://openai.com)

248 points by meetpateltech 4 days ago | 146 comments | View on ycombinator

Tiberium 4 days ago |

I checked the current speed over the API, and so far I'm very impressed. Of course models are usually not as loaded on the release day, but right now:

- Older GPT-5 Mini is about 55-60 tokens/s on API normally, 115-120 t/s when used with service_tier="priority" (2x cost).

- GPT-5.4 Mini averages about 180-190 t/s on API. Priority does nothing for it currently.

- GPT-5.4 Nano is at about 200 t/s.

To put this into perspective, Gemini 3 Flash is about 130 t/s on Gemini API and about 120 t/s on Vertex.

This is raw tokens/s for all models, it doesn't exclude reasoning tokens, but I ran models with none/minimal effort where supported.

And quick price comparisons:

- Claude: Opus 4.6 is $5/$25, Sonnet 4.6 is $3/$15, Haiku 4.5 is $1/$5

- GPT: 5.4 is $2.5/$15 ($5/$22.5 for >200K context), 5.4 Mini is $0.75/$4.5, 5.4 Nano is $0.2/$1.25

- Gemini: 3.1 Pro is $2/$12 ($3/$18 for >200K context), 3 Flash is $0.5/$3, 3.1 Flash Lite is $0.25/$1.5

BoumTAC 4 days ago |

To me, mini releases matter much more and better reflect the real progress than SOTA models.

The frontier models have become so good that it's getting almost impossible to notice meaningful differences between them.

Meanwhile, when a smaller / less powerful model releases a new version, the jump in quality is often massive, to the point where we can now use them 100% of the time in many cases.

And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.

pscanf 4 days ago |

I quite like the GPT models when chatting with them (in fact, they're probably my favorites), but for agentic work I only had bad experiences with them.

They're incredibly slow (via official API or openrouter), but most of all they seem not to understand the instructions that I give them. I'm sure I'm _holding them wrong_, in the sense that I'm not tailoring my prompt for them, but most other models don't have problem with the exact same prompt.

Does anybody else have a similar experience?

simonw 4 days ago |

Here's a grid of pelicans for the different models and reasoning levels: https://static.simonwillison.net/static/2026/gpt-5.4-pelican...

HugoDias 4 days ago |

According to their benchmarks, GPT 5.4 Nano > GPT-5-mini in most areas, but I'm noticing models are getting more expensive and not actually getting cheaper?

GPT 5 mini: Input $0.25 / Output $2.00

GPT 5 nano: Input: $0.05 / Output $0.40

GPT 5.4 mini: Input $0.75 / Output $4.50

GPT 5.4 nano: Input $0.20 / Output $1.25

mikkelam 4 days ago |

Why are we treating LLM evaluation like a vibe check rather than an engineering problem?

Most "Model X > Model Y" takes on HN these days (and everywhere) seem based on an hour of unscientific manual prompting. Are we actually running rigorous, version-controlled evals, or just making architectural decisions based on whether a model nailed a regex on the first try this morning?

ibrahim_h 4 days ago |

The OSWorld numbers are kinda getting lost in the pricing discussion but imo that's the most interesting part. Mini at 72.1% vs 72.4% human baseline is basically noise, so why not just use mini by default unless you're hitting specific failure modes.

Also context bleed into nano subagents in multi-model pipelines — I've seen orchestrators that just forward the entire message history by default (or something like messages[-N:] without any real budgeting), so your "cheap" extraction step suddenly runs with 30-50K tokens of irrelevant context. And then what's even the point, you've eaten the latency/cost win and added truncation risk on top.

Has anyone actually measured where that cutoff is in practice? At what context size nano stops being meaningfully cheaper/faster in real pipelines, not benchmarks.

powera 4 days ago |

I've been waiting for this update.

For many "simple" LLM tasks, GPT-5-mini was sufficient 99% of the time. Hopefully these models will do even more and closer to 100% accuracy.

The prices are up 2-4x compared to GPT-5-mini and nano. Were those models just loss leaders, or are these substantially larger/better?

pugchat 1 day ago |

The mini/nano tier is where the data equation gets interesting.

Frontier models are expensive and used by developers who read terms of service. Mini models are cheap, embedded in apps, used by everyone—and users rarely understand what's being collected.

OpenAI's revenue model increasingly depends on high-volume, low-cost inference. That volume requires scale. Scale creates incentives to monetize the data flowing through the system.

The model quality race is mostly won. The next frontier is who accumulates the most behavioral data from everyday users. Mini releases are how you get there.

[Disclosure: I work with pugchat.ai, a privacy-focused AI aggregator—relevant bias]

cbg0 4 days ago |

Based on the SWE-Bench it seems like 5.4 mini high is ~= GPT 5.4 low in terms of accuracy and price but the latency for mini is considerably higher at 254 seconds vs 171 seconds for GPT5.4. Probably a good option to run at lower effort levels to keep costs down for simpler tasks. Long context performance is also not great.

technocrat8080 4 days ago |

5.4 Mini's OSWorld score is a pleasant surprise. When SOTA scores were still ~30-40 models were too slow and inaccurate for realtime computer use agents (rip Operator/Agent). Curious if anyone's been using these in production.

tintor 4 days ago |

Several customer testimonials for GPT-5.4 Mini have em dashes in them.

Did GPT write them?

fastpdfai 4 days ago |

One thing I really want to find out, is which model and how to process TONS of pdfs very very fast, and very accurate. For prediction of invoice date, accrual accounting and other accounting related purposes. So a decent smart model that is really good at pdf and image reading. While still being very very fast.

morpheos137 4 days ago |

i switched to claude when i found chatgpt would argue with just about anything I said even when it was wrong. they have over optimised antisychophancy. i want a model that simulates critical thinking not one that repeats half baked often incomplete dogmas. the chatgpt 5x range is extraordinarily powerful but also extra ordinarily frustrating to try to use for anything creative or productive that is original in my opinion. claude basically is able to think critically while being neither sycophantic or argumentative most of the time in my option with appropriate user prompting. recent chat gpts seem to fight me every step of the way when not doing boiler plate. i don't want to waste my time fighting a tool.

XCSme 4 days ago |

It's odd, that on many benchmarks, including mine[0], Nano does better than Mini.

5.4 mini seems to struggle with consistency, and even with temperature 0 sometimes gives the correct response, sometimes a wrong one...

[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...

nicpottier 4 days ago |

I've been struggling on finding a reasonably priced model to use with my toy openclaw instance. Opus 4.6 felt kinda magical but that's just too expensive and I'm not risking my max subscription for it.

GPT 5.4 mini is the first alternative that is both affordable and decent. Pretty impressed. On a $20 codex plan I think I'm pretty set and the value is there for me.

ryao 4 days ago |

I will be impressed when they release the weights for these and older models as open source. Until then, this is not that interesting.

beklein 4 days ago |

As a big Codex user, with many smaller requests, this one is the highlight: "In Codex, GPT‑5.4 mini is available across the Codex app, CLI, IDE extension and web. It uses only 30% of the GPT‑5.4 quota, letting developers quickly handle simpler coding tasks in Codex for about one-third the cost." + Subagents support will be huge.

michaelgdwn 4 days ago |

The Nano tier is the one I'm watching. For agent workflows where you're making dozens of LLM calls per task, the cost per call matters more than peak capability. Would be interesting to see benchmarks on function calling latency specifically — that's what matters for agents.

dmix 4 days ago |

Last time I used GPT-5 mini it seems much slower than the primary GPT model API when we used it for an AI chat agent. Particularly around streaming the responses. But everything I've read implies it's supposed to be faster.

dack 4 days ago |

i want 5.4 nano to decide whether my prompt needs 5.4 xhigh and route to it automatically

6thbit 4 days ago |

Looking at the long context benchmark results for these, sounds like they are best fit for also mini-sized context windows.

Is there any harness with an easy way to pick a model for a subagent based on the required context size the subagent may need?

Rapzid 4 days ago |

Oh.. I thought maybe these would be upgrades to gpt-4.1 and gpt-4.1-mini and etc.. But the latency is way too high compared to the 400-600. Yeah, different models and etc but the naming is confusing.

AbstractH24 3 days ago |

Is anyone else getting numb to new model announcements?

bananamogul 4 days ago |

They could call them something like “sonnet” and “haiki” maybe.

jbellis 4 days ago |

Benchmarking these now.

Preregistering my predictions:

Mini: better than Haiku but not as good as Flash 3, especially at reasoning=none.

Nano: worse than Flash 3 Lite. Probably better than Qwen 3.5 27b.

beernet 4 days ago |

Crazy how OAI is way behind now and the only one to blame is Sam, his ego and lust for influence. Their downwards trajectory of paying accounts since "the move" (DoW deal) is an open secret. If you had placed a new CEO at OAI six months ago and told him to destroy the company, it would have been hard for that CEO to do a better job at that than Sam did. Should have left when he was let go but decided to go full Greg and MAGA instead. Here we are. Go Dario

simianwords 4 days ago |

why isn't nano available in codex? could be used for ingesting huge amount of logs and other such things

jerrygoyal 4 days ago |

Is GPT-5.4Mini drastically or marginally better for writing tasks as compared to GPT-5Mini?

xyproto 3 days ago |

OpenAI has "open" in the name without being anything similar to "open source". Additionally, they have not rejected using their technology for automatically killing people and for mass surveillance. I deleted my OpenAI account, and it felt good. Recommended.

machinecontrol 4 days ago |

What's the practical advantage of using a mini or nano model versus the standard GPT model?

kseniamorph 4 days ago |

wow, not bad result on the computer use benchmark for the mini model. for example, Claude Sonnet 4.6 shows 72.5%, almost on par with GPT-5.4 mini (72.1%). but sonnet costs 4x more on input and 3x more on output

yomismoaqui 4 days ago |

Not comparing with equivalent models from Anthropic or Google, interesting...

casey2 4 days ago |

I googled all the testimonial names and they are all linked-in mouthpieces.

varispeed 4 days ago |

I stopped paying attention to GPT-5.x releases, they seem to have been severely dumbed down.

derefr 4 days ago |

OpenAI don't talk about the "size" or "weights" of these models any more. Anyone have any insight into how resource-intensive these Mini/Nano-variant models actually are at this point?

I assume that OpenAI continue to use words like "mini" and "nano" in the names of these model variants, to imply that they reserve the smallest possible resource-units of their inference clusters... but, given OpenAI's scale, that may well be "one B200" at this point, rather than anything consumers (or even most companies) could afford.

I ask because I'm curious whether the economics of these models' use-cases and call frequency work out (both from the customer perspective, and from OpenAI's perspective) in favor of OpenAI actually hosting inference on these models themselves, vs. it being better if customers (esp. enterprise customers) could instead license these models to run on-prem as black-box software appliances.

But of course, that question is only interesting / only has a non-trivial answer, if these models are small enough that it's actually possible to run them on hardware that costs less to acquire than a year's querying quota for the hosted version.

reconnecting 4 days ago |

All three ChatGPT models (Instant, Thinking, and Pro) have a new knowledge cutoff of August 2025.

Seriously?

miltonlost 4 days ago |

[flagged]

system2 4 days ago |

I am feeling the version fatigue. I cannot deal with their incremental bs versions.