The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition

dc.contributor.authorMukiibi, Jonathan
dc.contributor.authorKatumba, Andrew
dc.contributor.authorNakatumba-Nabende, Joyce
dc.contributor.authorHussein, Ali
dc.contributor.authorMeyer, Josh
dc.date.accessioned2022-12-01T18:58:20Z
dc.date.available2022-12-01T18:58:20Z
dc.date.issued2022
dc.description.abstractBuilding a usable radio monitoring automatic speech recognition (ASR) system is a challenging task for under-resourced languages and yet this is paramount in societies where radio is the main medium of public communication and discussions. Initial efforts by the United Nations in Uganda have proved how understanding the perceptions of rural people who are excluded from social media is important in national planning. However, these efforts are being challenged by the absence of transcribed speech datasets. In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech corpus of 155 hours. To our knowledge, this is the first publicly available radio dataset in sub-Saharan Africa. The paper describes the development of the voice corpus and presents baseline Luganda ASR performance results using Coqui STT toolkit, an open source speech recognition toolkit.en_US
dc.identifier.citationMukiibi, J., Katumba, A., Nakatumba-Nabende, J., Hussein, A., & Meyer, J. (2022). The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition. arXiv preprint arXiv:2206.09790.en_US
dc.identifier.urihttps://arxiv.org/abs/2206.09790
dc.identifier.urihttps://nru.uncst.go.ug/handle/123456789/5634
dc.language.isoenen_US
dc.publisherarXiven_US
dc.subjectSpeech corpusen_US
dc.subjectLuganda radioen_US
dc.subjectAutomatic speech recognitionen_US
dc.titleThe Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognitionen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
The Makerere Radio Speech Corpus.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format
Description:
Article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: