The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Loading...
Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
arXiv
Abstract
Building a usable radio monitoring automatic speech recognition (ASR) system is a challenging task for under-resourced
languages and yet this is paramount in societies where radio is the main medium of public communication and discussions.
Initial efforts by the United Nations in Uganda have proved how understanding the perceptions of rural people who are
excluded from social media is important in national planning. However, these efforts are being challenged by the absence of
transcribed speech datasets. In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech
corpus of 155 hours. To our knowledge, this is the first publicly available radio dataset in sub-Saharan Africa. The paper
describes the development of the voice corpus and presents baseline Luganda ASR performance results using Coqui STT
toolkit, an open source speech recognition toolkit.
Description
Keywords
Speech corpus, Luganda radio, Automatic speech recognition
Citation
Mukiibi, J., Katumba, A., Nakatumba-Nabende, J., Hussein, A., & Meyer, J. (2022). The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition. arXiv preprint arXiv:2206.09790.