Google Translate adds two more South African languages

2 days ago 63

Google Translate, which can instantly translate text, images, speech, websites and video from one language to another, has added TshiVenda and SiSwati to its language suite, bringing the total number of official South African languages supported to nine.

“Approximately 1.2 million South Africans speak TshiVenda, while siSwati, an Nguni language, is spoken by around 1.5 million people living in Eswatini and South Africa. Their addition brings to nine the number of South African languages available on Google Translate, with Afrikaans, English, Sepedi, Sesotho, Sepedi, isiXhosa and isiZulu having been added in previous expansions,” said Google in a statement on Thursday.

The addition of the two languages is part of a 110-language expansion of Google Translate, which makes use of Google’s Palm 2 large language model. According to the statement, the expansion also represents the largest addition of African languages, with more than a quarter of those added coming from the continent.

Among the other languages included in the latest roll-out, are:

  • Afar: A tonal language spoken in Djibouti, Eritrea and Ethiopia. Of all the languages in this launch, Afar had the most volunteer community contributions.
  • Cantonese: One of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it’s tricky to find data and train models.
  • Manx: A Celtic language of the Isle of Man. It almost went extinct with the death of its last native speaker in 1974. But thanks to an island-wide revival movement, there are now thousands of speakers.
  • NKo: A standardised form of the West African Manding languages that unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops resources and technology for it today.
  • Punjabi (Shahmukhi): A variety of Punjabi written in Perso-Arabic script (Shahmukhi). This is the most spoken language in Pakistan.
  • Tamazight (Amazigh): A Berber language spoken across North Africa. Although there are many dialects, the written form is generally mutually understandable. It’s written in Latin script and Tifinagh script, both of which Google Translate supports.
  • Tok Pisin: An English-based creole and the lingua franca of Papua New Guinea.

“A lot of consideration goes into new language additions for Google Translate, ranging from which languages to include to the use of specific spellings,” said Siya Madikane, communications manager at Google South Africa.

“Many languages do not have a single, standard form, so learning the specific dialect that is spoken the most in an area is more feasible. Our approach has been to prioritise the most commonly used varieties of each language.”  — © 2024 NewsCentral Media

Read next: Haibo! AI language models for Zulu and Sotho in the works