Machine Translation for African Languages: Community Creation of Datasets and Models in Uganda
Loading...
Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
n African Natural Language Processing
Abstract
Reliable machine translation systems are only available for a small proportion of
the world’s languages, the key limitation being a shortage of training and evaluation
data. We provide a case study in the creation of such resources by NLP
teams who are local to the communities in which these languages are spoken. A
parallel text corpus, SALT, was created for five Ugandan languages (Luganda,
Runyankole, Acholi, Lugbara and Ateso) and various methods were explored to
train and evaluate translation models. The resulting models were found to be
effective for practical translation applications, even for those languages with no
previous NLP data available, achieving mean BLEU score of 26.2 for translations
to English, and 19.9 from English. The SALT dataset and models described are
publicly available at
Description
Keywords
Citation
Akera, B., Mukiibi, J., Naggayi, L. S., Babirye, C., Owomugisha, I., Nsumba, S., ... & Quinn, J. (2022, March). Machine Translation For African Languages: Community Creation Of Datasets And Models In Uganda. In 3rd Workshop on African Natural Language Processing.