134 points by j0e1 5 days ago | 43 comments | View on ycombinator
stingraycharles 1 day ago |
gojomo 1 day ago |
Can't achieve subject-verb agreement in 1st sentence of their English abstract.
Advances made through No Language Left Behind (NLLB) have demonstrated that high-quality machine translation (MT) scale to 200 languages.
ks2048 1 day ago |
I'm currently concentrating on better data gathering for low-resource languages.
When you look in detail at data like Common Crawl, finepdfs, and fineweb, (1) they are really lacking quality data sources if you know where to look, and (2) the sources they have are not processed "finely" enough (e.g. finepdfs classify each page of PDF as having a specific language, where-as many language learning sources have language pairs, etc.
djoldman 1 day ago |
Is it open weight? If so, why isn't there just a straight link to the models?
mrlonglong about 3 hours ago |
garyclarke27 1 day ago |
pxtail 1 day ago |
psychoslave 1 day ago |
ks2048 1 day ago |
intended 1 day ago |
It looks like meta found a way forward.
Reading meta’s abstract, it seems that they have found ways to improve the quality of the training data, and also new evaluation tools?
They are also saying that OMT-LLaMA does a better job at text generation than other baseline models.
asveikau about 19 hours ago |
croes 1 day ago |
lzhgusapp about 17 hours ago |
rowanseerwald 1 day ago |
tempaccountabgd about 21 hours ago |
ath3nd 1 day ago |
true21733 5 days ago |
bikeshaving 1 day ago |
https://www.amnesty.org/en/latest/news/2025/02/meta-new-poli... https://www.amnesty.org/en/latest/news/2023/10/meta-failure-...
ks2048 1 day ago |
Google Translate is a good default, but LLMs are really good at translations, as they’re better capable at understanding context and providing culturally appropriate translations.
(I live in Cambodia where they speak Khmer)