Browsing by Author "Mukiibi, Jonathan"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Building Text and Speech Benchmark Datasets and Models for Low-Resourced East African Languages: Experiences and Lessons(Applied AI Letters,, 2025-03-26) Nakatumba-Nabende, Joyce; Nabende, Peter; Mukiibi, Jonathan; Mutebi, Chodrine; Katumba, AndrewAfrica has over 2000 languages; however, those languages are not well represented in the existing natural language processing ecosystem. African languages lack essential digital resources to effectively engage in advancing language technologies. There is a need to generate high-quality natural language processing resources for low-resourced African languages. Obtaining high-quality speech and text data is expensive and tedious because it can involve manual sourcing and verification of data sources. This paper discusses the process taken to curate and annotate text and speech datasets for five East African languages: Luganda, Runyankore-Rukiga, Acholi, Lumasaba, and Swahili. We also present results obtained from baseline models for machine translation, topic modeling and classification, sentiment classification, and automatic speech recognition tasks. Finally, we discuss the experiences, challenges, and lessons learned in creating the text and speech datasets.Item Machine Learning Analysis of Radio Data to Uncover Community Perceptions on the Ebola Outbreak in Uganda(ACM Journal on Computing and Sustainable Societies, 2024-09-16) Nakatumba-Nabende, Joyce; Mukiibi, Jonathan; Bateesa, Tobius Saul; Murindanyi, Sudi; Katumba, Andrew; Mutebi, ChodrineRadio is vital for people, especially in rural areas, to share their concerns through interactive talk shows. Understanding public perceptions of pandemics is crucial because they influence people’s attitudes and health-seeking behaviors. This study used machine learning to analyze English and Luganda radio broadcast data to understand public perceptions and perspectives on the Ebola outbreak in Uganda. Our findings revealed three main speaker categories: media personalities, community guests and listeners, and government officials. The government made the most significant effort to educate the public about the Ebola outbreak. The analysis showed that the community was hesitant to use Ebola vaccines, believing that they had not been tested on other populations where the Ebola virus had originated. The community was also concerned about the effects of the lockdown measures imposed during the COVID-19 pandemic. The analysis of the radio broadcast data revealed differences in the timing and content of the conversations between male and female speakers. These experiences can inform population-specific policies for handling ongoing and future pandemics.Item MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages(arXiv preprint arXiv, 2023-05-23) Dione, Cheikh M. Bamba; Nabende, Peter; Mukiibi, Jonathan; Chinedu Uchechukwu; Uchechukwu, Chinedu; Abdullahi, Muhammad; Klakow, DietrichIn this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pretrained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the Masakha- POS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.