Hacker news

Top
New
Past
Ask
Show
Jobs

Mistral AI Releases Forge (https://mistral.ai)

716 points by pember 2 days ago | 184 comments | View on ycombinator

kioleanu 1 day ago |

I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.

I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.

ogou 1 day ago |

Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.

Not everyone is obsessed with code generation. There is a whole world out there.

mark_l_watson 1 day ago |

I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.

upghost 1 day ago |

> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

> Post-training methods allow teams to refine model behavior for specific tasks and environments.

How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

roxolotl 1 day ago |

Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.

jcmartinezdev 1 day ago |

Mistral is doing some really great stuff lately. Sure, it's hard to compete with OpenAI and Anthropic and their models, but they are taking up some interesting takes and designing their product in unique ways.

I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!

ryeguy_24 1 day ago |

How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.

dmix 1 day ago |

This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.

csunoser 1 day ago |

Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.

dash2 1 day ago |

I think it’s interesting what this approach suggests about who will profit from AI. I’m sceptical that having huge numbers of GPUs is a moat. After all, real humans – even geniuses – are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. It’s hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companies’ proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.

losvedir 1 day ago |

> Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.

I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.

todteera 1 day ago |

Interesting how Mistral is investing into training models for industry specific use cases. With the commoditization of intelligence by base models, they're probably looking to creating value from specialized verticals.

jbverschoor 1 day ago |

ASML and ESA as clients means something. I dont expect to see the first name somewhere else on the logo list

andai 1 day ago |

They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

tho23i42342397 1 day ago |

Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.

I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.

Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).

Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.

alansaber 1 day ago |

I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.

rorylawless 1 day ago |

The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

hermit_dev 1 day ago |

The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.

zby 1 day ago |

My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.

speedgoose 1 day ago |

I was enthusiastic but it’s "contact us" priced for now. I was expecting a classic cloud LLM forge with a public pricing.

Aldipower 1 day ago |

I cannot keep up with their products, model names and releases. What is what for? Their marketing texts do not make sense for me. Is there a nice overview somewhere?

I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)

thecopy 1 day ago |

Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"

Dissapointing.

whatever1 1 day ago |

I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?

apexalpha 1 day ago |

This looks good but how much money are we talking here? Are we 'retraining' an entire model but adding enterprise data to the public data set?

krinne 1 day ago |

I wasnt able to find a way to access this - is this something accessible only to enterprises ?

Would love to take it for a spin, if that is even possible.

aavci 1 day ago |

How does this compare to fine tuning?

spacesh1psoda 1 day ago |

Go EU!

Havoc 1 day ago |

Good for them. Really hope they find market fit

burgerquizz 1 day ago |

can i use mistral to read my source code and teach it so i don't need to inject the whole doc every single time and consume token every single time?

supernes 1 day ago |

> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not

... for humans.

bsjshshsb 1 day ago |

Id training or FT > context? Anyone have experience.

Is it possible to retrain daily or hourly as info changes?

dragochat 1 day ago |

where sample notebook/script? where github? where signup?

...learn a thing or two from NVIDIA or gtfo

wei03288 1 day ago |

[dead]

maxothex 1 day ago |

[dead]

webagent255 about 22 hours ago |

[dead]

myylogic about 23 hours ago |

[dead]

codance 1 day ago |

[dead]

shablulman 2 days ago |

[dead]

vincentbusch about 22 hours ago |

lol the AI-generated support reply about their own AI model is peak 2026

the naming mess is wild though. i ran into similar confusion trying to set up mistral for a side project — ended up just guessing which endpoint was the right one

genie3io 1 day ago |

[dead]

Heer_J about 23 hours ago |

[dead]

gpubridge 1 day ago |

[flagged]