Building a Parallel Corpus and Training Translation Models Between Luganda and English

dc.contributor.authorKimera, Richard
dc.contributor.authorRim, Daniela N.
dc.contributor.authorChoi, Heeyoul
dc.date.accessioned2023-01-27T16:03:21Z
dc.date.available2023-01-27T16:03:21Z
dc.date.issued2023
dc.description.abstractNeural machine translation (NMT) has achieved great successes with large datasets, so NMT is more premised on high-resource languages. This continuously underpins the low resource languages such as Luganda due to the lack of high-quality parallel corpora, so even 'Google translate' does not serve Luganda at the time of this writing. In this paper, we build a parallel corpus with 41,070 pairwise sentences for Luganda and English which is based on three different open-sourced corpora. Then, we train NMT models with hyper-parameter search on the dataset. Experiments gave us a BLEU score of 21.28 from Luganda to English and 17.47 from English to Luganda. Some translation examples show high quality of the translation. We believe that our model is the first Luganda-English NMT model. The bilingual dataset we built will be available to the public.en_US
dc.identifier.citationKimera, R., Rim, D. N., & Choi, H. (2023). Building a Parallel Corpus and Training Translation Models Between Luganda and English. arXiv preprint arXiv:2301.02773.https://doi.org/10.48550/arXiv.2301.02773en_US
dc.identifier.urihttps://nru.uncst.go.ug/handle/123456789/7349
dc.language.isoenen_US
dc.publisherarXiv preprint arXiven_US
dc.subjectLuganda, neural machine translation, Transformer, hyper-parameteren_US
dc.titleBuilding a Parallel Corpus and Training Translation Models Between Luganda and Englishen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Building a Parallel Corpus and Training Translation Models Between Luganda and English.pdf
Size:
495.5 KB
Format:
Adobe Portable Document Format
Description:
Building a Parallel Corpus and Training Translation Models Between Luganda and English
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: