Meta said it then “filtered a subset of this corpus with human-labeled and pseudo-labeled data, totaling 406,000 hours.” The SeamlessM4T research paper said Meta’s researchers “created a multimodal corpus of automatically aligned speech translations of more than 470,000 hours” which is the SeamlessAlign dataset they have. Alongside this, Meta is also releasing the metadata of SeamlessAlign, which it calls “the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments.”Īrs Technica, in its report, notes a wrinkle in the research paper explaining how SeamlessM4T works, as it was vague regarding where the data came from to train the artificial intelligence. Meta is publicly releasing SeamlessM4T under a research license so researchers and developers can build on what’s been made so far. Lastly, it will also do text-to-speech translations, supporting “nearly 100 input languages and 35 (plus English) output languages.” It also does speech-to-speech translation, where you input spoken words from 100 possible languages, and it’ll output spoken translations back from 36 possible languages. SeamlessM4T can recognize 100 languages, and does speech-to-text translation and text-to-text outputs for 100 languages. MANILA, Philippines – Meta announced on Tuesday, August 22, it had come out with “the first all-in-one multilingual multimodal AI (artificial intelligence) translation and transcription model.”Īccording to a statement from Meta, this model, called SeamlessM4T, “can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages depending on the task.”
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |