716 points by pember 2 days ago | 184 comments | View on ycombinator
kioleanu 1 day ago |
ogou 1 day ago |
Not everyone is obsessed with code generation. There is a whole world out there.
mark_l_watson 1 day ago |
upghost 1 day ago |
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
roxolotl 1 day ago |
jcmartinezdev 1 day ago |
I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!
ryeguy_24 1 day ago |
dmix 1 day ago |
csunoser 1 day ago |
dash2 1 day ago |
losvedir 1 day ago |
I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
todteera 1 day ago |
jbverschoor 1 day ago |
andai 1 day ago |
It's feasible for small models but, I thought small models were not reliable for factual information?
tho23i42342397 1 day ago |
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
alansaber 1 day ago |
rorylawless 1 day ago |
hermit_dev 1 day ago |
zby 1 day ago |
speedgoose 1 day ago |
Aldipower 1 day ago |
I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)
thecopy 1 day ago |
Dissapointing.
whatever1 1 day ago |
apexalpha 1 day ago |
krinne 1 day ago |
Would love to take it for a spin, if that is even possible.
aavci 1 day ago |
spacesh1psoda 1 day ago |
Havoc 1 day ago |
burgerquizz 1 day ago |
supernes 1 day ago |
... for humans.
bsjshshsb 1 day ago |
Is it possible to retrain daily or hourly as info changes?
dragochat 1 day ago |
...learn a thing or two from NVIDIA or gtfo
wei03288 1 day ago |
maxothex 1 day ago |
webagent255 about 22 hours ago |
myylogic about 23 hours ago |
codance 1 day ago |
shablulman 2 days ago |
vincentbusch about 22 hours ago |
the naming mess is wild though. i ran into similar confusion trying to set up mistral for a side project — ended up just guessing which endpoint was the right one
genie3io 1 day ago |
Heer_J about 23 hours ago |
gpubridge 1 day ago |
I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.