539 points by EvanZhouDev 4 days ago | 254 comments | View on ycombinator
camelmel 4 days ago |
bel8 4 days ago |
And this certainly wont bring me back to GitHub Copilot which I cancelled yesterday.
GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs: https://www.reddit.com/r/GithubCopilot
I have since changed to DeekSeek Flash on high which is Sonnet+ level for almost free.
If I feel I still need smarter models I might signup for $20/mo Codex to use GPT 5.5 which, in my opinion, is the best I can access right now.
hmokiguess 4 days ago |
cwillu 4 days ago |
capten 4 days ago |
Why not sell it as a math agent? Why do I have to set up 4 agents to check each others' work?
AntiRush 4 days ago |
https://microsoft.ai/news/introducingmai-code-1-flash/
and the model card
https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF
The broader announcement of 7 MAI models seems to be where the 5B active in the title comes from
https://microsoft.ai/news/building-a-hillclimbing-machine-la...
eterevsky 4 days ago |
efields 4 days ago |
That scroll effect is jank city for me (yeah yeah works fine in Chrome/Edge).
OsrsNeedsf2P 4 days ago |
tosh 4 days ago |
deckar01 4 days ago |
mentos 4 days ago |
Seems like the work from a good system design to code is practically solved.
Now it’s a matter of the design of the system. Or is that represented in these evals?
AJRF 4 days ago |
ajyoon 4 days ago |
jnwatson 4 days ago |
When I need a light model, I reach for Sonnet. It is nearly free on the max plans, and quite fast. I don't see a place for Haiku in regular coding.
Haiku I guess is when you need summarization/categorization at scale.
Microsoft setting Haiku as the benchmark is a low bar.
smcleod 4 days ago |
LoganDark 4 days ago |
jMyles 4 days ago |
But it seems like, by and large, even the faster models are now aimed at longer-running agentic flows and not sub-1s autocomplete. Or am I wrong about that?
ronbenton 4 days ago |
That sounds like something you say when you don't benchmark well
onlyrealcuzzo 4 days ago |
npn 4 days ago |
While the scores are not good compare to other open weight model, the important thing to note is their training data (as they claimed) is very clean, without any synthetic datasets.
motoboi 3 days ago |
I understand github copilot rollout takes time, but why can't we consume the models via microsoft own api after launching?
Anthropic models are available at foundry the same moment they are launched, but not Microsoft's own models.
Hfuffzehn 3 days ago |
Model Input Cached input Output
MAI-Code-1-Flash $0.75 $0.075 $4.50
Comparing to
Claude Haiku 4.5 $1.00 $0.10 $5.00
looks fine.
But they also forgot to include the benchmarks comparing to
GPT-5.4 mini $0.75 $0.075 $4.50
Those would have been helpful.
giancarlostoro 4 days ago |
dang 4 days ago |
MAI-Thinking-1 - https://news.ycombinator.com/item?id=48374362 - June 2026 (64 comments)
XCSme 2 days ago |
Curious to test them and see how they perform.
mmaunder 4 days ago |
mekpro 3 days ago |
zoobab 3 days ago |
Well still no list nor publication of the training data.
bguberfain 4 days ago |
tgtweak 3 days ago |
Why not showcase it against something in a similar domain like qwen3.6 or gemma 4?
mchl-mumo 4 days ago |
GaryBluto 4 days ago |
hootz 4 days ago |
undefined 4 days ago |
ChicagoDave 3 days ago |
The eye-opener is clean licensed data with filters for AI content (not sure how you do that).
If MSFT builds up using an ethical approach, there is a large anti-AI audience that might take note.
ruined 4 days ago |
schmorptron 3 days ago |
notenkidev 4 days ago |
ramaseshanms 3 days ago |
"MAI-Code-1-Flash outperforms Claude Haiku 4.5"
striking 4 days ago |
Computer0 4 days ago |
aubanel 3 days ago |
gslepak 4 days ago |
halapro 4 days ago |
cainxinth 4 days ago |
undefined 4 days ago |
randomsc 4 days ago |
gruntled-worker 4 days ago |
Marciplan 4 days ago |
kylehotchkiss 4 days ago |
Why not assign them to make windows good :D
tornikeo 4 days ago |
arunkant 4 days ago |
yieldcrv 4 days ago |
zb3 4 days ago |
ilia-a 4 days ago |
Nuspect 4 days ago |
freediddy 4 days ago |
mat0 4 days ago |
aikmack199 3 days ago |
nicogentile 2 days ago |
hanzeweiasa 4 days ago |
gbkgbk 3 days ago |
overfits-ai 3 days ago |
maxothex 3 days ago |
haeseong 3 days ago |
songting591 3 days ago |
overfits-ai 4 days ago |
pasrom 3 days ago |
AAYALAG 4 days ago |
Ozzie-D 4 days ago |
vancekai 4 days ago |
ghord 4 days ago |
pzo 4 days ago |
briangao 4 days ago |
hbwang2076 4 days ago |
fooker 4 days ago |
mattlondon 4 days ago |
Performance doesn't seem that good:
- MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro
- Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.