Hacker news

Top
New
Past
Ask
Show
Jobs

Meta's Omnilingual MT for 1,600 Languages (https://ai.meta.com)

134 points by j0e1 5 days ago | 43 comments | View on ycombinator

stingraycharles 1 day ago |

I find that meta’s translations are very poor compared to others, at least for relatively obscure languages, which I figured was relevant considering the article.

Google Translate is a good default, but LLMs are really good at translations, as they’re better capable at understanding context and providing culturally appropriate translations.

(I live in Cambodia where they speak Khmer)

gojomo 1 day ago |

Can translate between 1600 languages.

Can't achieve subject-verb agreement in 1st sentence of their English abstract.

Advances made through No Language Left Behind (NLLB) have demonstrated that high-quality machine translation (MT) scale to 200 languages.

ks2048 1 day ago |

I'll be looking at this in detail. I've started a company to do similar things, https://6k.ai

I'm currently concentrating on better data gathering for low-resource languages.

When you look in detail at data like Common Crawl, finepdfs, and fineweb, (1) they are really lacking quality data sources if you know where to look, and (2) the sources they have are not processed "finely" enough (e.g. finepdfs classify each page of PDF as having a specific language, where-as many language learning sources have language pairs, etc.

djoldman 1 day ago |

Just spent a long time trying to find where you can download any of these weights.

Is it open weight? If so, why isn't there just a straight link to the models?

mrlonglong about 3 hours ago |

It can't even do decent Welsh to English translations.

garyclarke27 1 day ago |

They can translate 1600 languages, but they cannot do basic text formatting, where are the paragraphs?

pxtail 1 day ago |

Where are real, useful features, why in 2026 can't I get transcript of voice messages in my chat?

psychoslave 1 day ago |

That's a high count, but still a bit away from "Omni". Usual count is between 4k and 8k depending the source. But the first 1k might be the hardest, certainly.

ks2048 1 day ago |

Another interesting thing mentioned here is: BOUQuET: Benchmark and Open-initiative for Universal Quality Evaluation in Translation.

https://huggingface.co/spaces/facebook/bouquet

intended 1 day ago |

Didn’t research show that models get worse at translation the more languages get added in? The curse of multilinguality? Lauscher 2020?

It looks like meta found a way forward.

Reading meta’s abstract, it seems that they have found ways to improve the quality of the training data, and also new evaluation tools?

They are also saying that OMT-LLaMA does a better job at text generation than other baseline models.

asveikau about 19 hours ago |

They've come a long way since enabling Burmese genocide citing lack of available translations.

croes 1 day ago |

Off topic, since the AI craze MS‘ documentation translation has ridiculous errors like translating try catch keywords to "versuchen" and "fangen" for German pages

lzhgusapp about 17 hours ago |

[dead]

rowanseerwald 1 day ago |

[dead]

tempaccountabgd about 21 hours ago |

[dead]

ath3nd 1 day ago |

[dead]

true21733 5 days ago |

[dead]

bikeshaving 1 day ago |

I’m very wary of celebrating Meta’s language work when the company was credibly found to have contributed to the genocide against the Rohingya in Myanmar, and separately, to human rights abuses against Tigrayans during the conflict in northern Ethiopia. Be careful whose sins you’re laundering.

https://www.amnesty.org/en/latest/news/2025/02/meta-new-poli... https://www.amnesty.org/en/latest/news/2023/10/meta-failure-...

ks2048 1 day ago |

Meta released No Language Left Behind (NLLB) [1], I think in 2022. I wonder why this in not "NLLB 2.0"? These companies love introducing new names to confuse things

[1] https://ai.meta.com/research/no-language-left-behind/