Browsing by Author "Nakatumba-Nabende, Joyce"
Now showing 1 - 20 of 23
Results Per Page
Sort Options
Item Agile Islands in a Waterfall Environment: Challenges and Strategies in Automotive(2020) Kasauli, Rashidah; Knauss, Eric; Nakatumba-Nabende, Joyce; Kanagwa, BenjaminDriven by the need for faster time-to-market and reduced development lead-time, large-scale systems engineering companies are adopting agile methods in their organizations. This agile transformation is challenging and it is common that adoption starts bottom-up with agile software teams within the context of traditional company structures. This creates the challenge of agile teams working within a documentcentric and plan-driven (or waterfall) environment. While it may be desirable to take the best of both worlds, it is not clear how that can be achieved especially with respect to managing requirements in large-scale systems. This paper presents an exploratory case study focusing on two departments of a large-scale systems engineering company (automotive) that is in the process of company-wide agile adoption. We present challenges that agile teams face while working within a larger plan-driven context and propose potential strategies to mitigate the challenges. Challenges relate to, e.g., development teams not being aware of the high-level requirements, difficulties to manage change of these requirements as well as their relationship to backlog items such as user stories. While we found strategies for solving most of the challenges, they remain abstract and empirical research on their effectiveness is currently lackingItem Building Text and Speech Benchmark Datasets and Models for Low-Resourced East African Languages: Experiences and Lessons(Applied AI Letters,, 2025-03-26) Nakatumba-Nabende, Joyce; Nabende, Peter; Mukiibi, Jonathan; Mutebi, Chodrine; Katumba, AndrewAfrica has over 2000 languages; however, those languages are not well represented in the existing natural language processing ecosystem. African languages lack essential digital resources to effectively engage in advancing language technologies. There is a need to generate high-quality natural language processing resources for low-resourced African languages. Obtaining high-quality speech and text data is expensive and tedious because it can involve manual sourcing and verification of data sources. This paper discusses the process taken to curate and annotate text and speech datasets for five East African languages: Luganda, Runyankore-Rukiga, Acholi, Lumasaba, and Swahili. We also present results obtained from baseline models for machine translation, topic modeling and classification, sentiment classification, and automatic speech recognition tasks. Finally, we discuss the experiences, challenges, and lessons learned in creating the text and speech datasets.Item Catching up with Method and Process Practice: An Industry-Informed Baseline for Researchers(IEEE, 2019) Klunder, Jil; Hebig, Regina; Tell, Paolo; Kuhrmann, Marco; Nakatumba-Nabende, Joyce; Heldal, Rogardt; Prikladnickixv, Rafael; Tuzunxvi, Eray; Pfahlxvii, Dietmar; Schneider, Kurt; MacDonellxviii, Stephen G.Software development methods are usually not applied by the book. Companies are under pressure to continuously deploy software products that meet market needs and stakeholders’ requests. To implement efficient and effective development processes, companies utilize multiple frameworks, methods and practices, and combine these into hybrid methods. A common combination contains a rich management framework to organize and steer projects complemented with a number of smaller practices providing the development teams with tools to complete their tasks. In this paper, based on 732 data points collected through an international survey, we study the software development process use in practice. Our results show that 76.8% of the companies implement hybrid methods. Company size as well as the strategy in devising and evolving hybrid methods affect the suitability of the chosen process to reach company or project goals. Our findings show that companies that combine planned improvement programs with process evolution can increase their process’ suitability by up to 5%.Item Comparison of Occurrence of Design Smells in Desktop and Mobile Applications(ACSE, 2020) Ogenrwot, Daniel; Nakatumba-Nabende, Joyce; Chaudron, Michel R.V.Design smells are symptoms of poor solutions to recurring design problems in a software system. Those symptoms have a direct negative impact on software quality by making it difficult to comprehend and maintain. In this paper we compare the occurrence of design smells between different technological ecosystems: windows/desktop and android/mobile. This knowledge is significant for various software maintenance activities such as program quality assurance and refactoring. To supplement previous findings, our study aimed at (a) understanding if and how the relationship among design smells differs across windows and mobile applications and (b) determining the groups of design smells that tend to occur frequently together and the magnitude of their occurrence in windows and mobile applications. In this study, we explored the use of statistics and unsupervised learning on a dataset consisting of twelve (12) Javabased open-source projects mined from GitHub. We identified fifteen (15) most frequent design smells across desktop and mobile applications. Additionally, a clustering technique revealed which groups of design smells that often co-occur. Specifically, {SpeculativeGenerality, SwissArmyKnife} and {LongParameterList, ClassDataShouldBePrivate} are observed to occur frequently together in desktop and mobile applications.Item A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data(AfricaNLP workshop, 2022) Bateesa, Tobius Saul; Babirye, Claire; Nakatumba-Nabende, JoyceExtracting functional themes and topics from a large text corpus manually is usually infeasible. There is a need to build text mining techniques such as topic modeling, which provide a mechanism to infer topics from a corpus of text automatically. This paper discusses topic modeling and topic classification models on Luganda text data. For topic modeling, we considered a Non-negative matrix factorization (NMF) which is an unsupervised machine learning algorithm that extracts hidden patterns from unlabeled text data to create latent topics, and for topic classification, we considered classic approaches, neural networks, and pretrained algorithms. The Bidirectional Encoder Representations from Transformers( BERT), a pretrained model that uses an attention mechanism that learns contextual relations between words (or sub-words) in a text, and a Support Vector Machine (SVM) algorithm, a classic model which analyzes particular properties of learning within text data, record the best results for topic classification. Our results indicate that topic modeling and topic classification algorithms produce relatively similar results when topic classification algorithms are trained on a balanced dataset.Item A dataset of cassava whitefly count images(Data in Brief, 2022) Nakatumba-Nabende, Joyce; Tusubira, Jeremy Francis; Babirye, Claire; Nsumba, Solomon; Omongo Abu, ChristopherWhiteflies are insect vectors that affect a variety of plants such as tomatoes, cabbages, sweet potatoes, eggplants, and cassava. In Uganda, whiteflies are a major contributor to the spread of Cassava Brown Streak Disease (CBSD). By suckling on infected cassava plants, whiteflies can potentially transfer the Cassava Brown Streak Virus that causes CBSD to unin- fected clean plants nearby when they migrate. When they attack the cassava plants in large numbers, whiteflies can also cause significant physical damage through suckling. This eventually can lead to leaf loss or plant death. Whiteflies also excrete “honeydew”, which harbors a fungus known as “sooty mold”that covers the leaves, limiting access to sun- light which in turn affects plant food production. As part of their work, the cassava breeders often conduct studies to as- sess the population of whiteflies in cassava fields through a manual process of visual inspection which can be arduous and time-consuming. This paper presents a cassava whitefly dataset that has been curated to enable researchers to build solutions for the automation of the count and detection of whiteflies. The dataset contains 3,0 0 0 images captured in a whitefly trial site in Uganda. It depicts different variations of whitefly infestation from low to high infestation. This data has already been used to provide a proof-of-concept solution for whitefly counting based on Machine Learning approaches.Item A dataset of necrotized cassava root cross-section images(Data in brief, 2020) Nakatumba-Nabende, Joyce; Akera, Benjamin; Tusubira, Jeremy Francis; Nsumba, Solomon; Mwebaze, ErnestCassava brown streak disease is a major disease affecting cas- sava. Along with foliar chlorosis and stem lesions, a very common symptom of cassava brown streak disease is the development of a dry, brown corky rot within the starch bearing tuberous roots, also known as necrosis. This paper presents a dataset of curated image data of necrosis bearing roots across different cassava varieties. The dataset contains images of cassava root cross-sections based on trial harvests from Uganda and Tanzania. The images were taken using a smartphone camera. The resulting dataset consists of 10,052 images making this the largest publicly available dataset for crop root necrosis. The data is comprehensive and contains different variations of necrosis expression including root cross-section types, number of necrosis lesions, presentation of the necrosis le- sions. The dataset is important and can be used to train ma- chine learning models which quantify the percentage of cas- sava root damage caused by necrosis.Item Evaluation of accessibility standards on Ugandan e-government websites(An International Journal, 2019) Nakatumba-Nabende, Joyce; Kanagwa, Benjamin; Nameere Kivunike, Florence; Tuape, MichaelIn spite of the efforts made by the Government of Uganda through the National IT Authority Uganda (NITA-U) to provide many of the government services online, web accessibility is still not considered as a major factor by the developers of the e-government websites. As a result, people with disabilities cannot use websites as effectively as people without disabilities. Therefore, the main objective of this study was to evaluate the extent by which Ugandan e-government websites meet the internationally accepted WCAG 2.0 standards. The analysis was done for 63 websites belonging to government ministries, departments and agencies. Website accessibility assessment was carried out using two automatic evaluation tools: TAW and AChecker. The results presented in this paper indicate that all the websites not do not satisfy the level AA accessibility guidelines. Although NITA-U has developed guidelines for building websites, there is still great need to improve accessibility on e-government websites.Item From Undergraduate (Software) Capstone Projects to Start-ups: Challenges and Opportunities in Higher Institutions of Learning(Middle East Conference on Software Engineering, 2022) Ogenrwot, Daniel; Olok Tabo, Geoffrey; Aber, Kevin; Nakatumba-Nabende, JoyceThe capstone project is a fundamental part of almost all science and engineering degrees. It is not only a requirement for the partial fulfillment of an accredited university programme but also a method of assessing the students’ general mastery of concepts, critical thinking, problem-solving, and transferable skills. Annually, final-year undergraduate students offering computing programmes in Uganda build innovative software solutions to real-world problems within and outside their community. Anecdotal evidence indicates that most of those innovations have the potential for commercialization and transformation into technology-based businesses. However, limited progress has been made to commercialize students’ projects, and promising solutions are “buried” within academic reports. To this end, our research aims to explain the challenges and opportunities in the commercialization of students’ capstone projects across two (2) undergraduate computing programmes (Bachelor of Science in Computer Science and Bachelor of Information Technology) offered at Gulu University in Uganda. Using exploratory research design, we reviewed eighty-six (86) capstone projects, curricula, and a facilitated students & stakeholders’ workshop report. This paper articulates factors hindering the commercialization of undergraduate software capstone projects and recommends mitigating measures. It also proposes a framework for extending capstone course design from a traditional curriculum structure to an inclusive industry and community-oriented approach capable of turning ideas into business start-ups. The findings from this research are expected to inform higher institutions of learning in Africa in developing novel pedagogical approaches for orchestrating (software) capstone project courses that are inclusive and profitable beyond the academic setting.Item Gender Bias Evaluation in Luganda-English Machine Translation(Association for Machine Translation in the Americas, 2022-09-07) Wairagala, Eric Peter; Mukiibi, Jonathan; Babirye, Claire; Nakatumba-Nabende, Joyce; Katumba, Andrew; Ssenkungu, IvanWe have seen significant growth in the area of building Natural Language Processing (NLP) tools for African languages. However, the evaluation of gender bias in the machine translation systems for African languages is not yet thoroughly investigated. This is due to the unavailability of explicit text data available for addressing the issue of gender bias in machine translation. In this paper, we use transfer learning techniques based on a pre-trained Marian MT model for building machine translation models for English-Luganda and Luganda-English. Our work attempts to evaluate and quantify the gender bias within a Luganda-English machine translation system using Word Embeddings Fairness Evaluation Framework (WEFE). Luganda is one of the languages with gender-neutral pronouns in the world, therefore we use a small set of trusted gendered examples as the test set to evaluate gender bias by biasing word embeddings. This approach allows us to focus on Luganda-Engish translations with gender-specific pronouns, and the results of the gender bias evaluation are confirmed by human evaluation. To compare and contrast the results of the word embeddings evaluation metric, we used a modified version of the existing Translation Gender Bias Index (TGBI) based on the grammatical consideration for Luganda.Item Integration of design smells and role-stereotypes classification dataset(Data in Brief, 2021) Ogenrwot, Daniel; Nakatumba-Nabende, Joyce; Chaudron, Michel R.V.Design smells are recurring patterns of poorly designed (fragments of) software systems that may hinder main- tainability. Role-stereotypes indicate generic responsibilities that classes play in system design. Although the concepts of role-stereotypes and design smells are widely divergent, both are significant contributors to the design and mainte- nance of software systems. To improve software design and maintainability, there is a need to understand the relation- ship between design smells and role stereotypes. This pa- per presents a fine-grained dataset of systematically inte- grated design smells detection and role-stereotypes classi- fication data. The dataset was created from a collection of twelve (12) real-life open-source Java projects mined from GitHub. The dataset consists of 18 design smells columns and 2,513 Java classes (rows) classified into six (6) role- stereotypes taxonomy. We also clustered the dataset into ten (10) different clusters using an unsupervised learning algo- rithm. Those clusters are useful for understanding the groups of design smells that often co-occur in a particular role- stereotype category. The dataset is significant for understand- ing the non-innate relationship between design smells and role-stereotypes.Item Keyword Spotter Model for Crop Pest and Disease Monitoring from Community Radio Data(arXiv preprint arXiv, 2019) Akera, Benjamin; Nakatumba-Nabende, Joyce; Mukiibi, Jonathan; Hussein, Ali; Baleeta, Nathan; Ssendiwala, Daniel; Nalwooga, SamiihaIn societies with well developed internet infrastructure, social media is the leading medium of communication for various social issues especially for breaking news situations. In rural Uganda however, public community radio is still a dominant means for news dissemination. Community radio gives audience to the general public especially to individuals living in rural areas, and thus plays an important role in giving a voice to those living in the broadcast area. It is an avenue for participatory communication and a tool relevant in both economic and social development.This is supported by the rise to ubiquity of mobile phones providing access to phone-in or text-in talk shows. In this paper, we describe an approach to analysing the readily available community radio data with machine learning-based speech keyword spotting techniques. We identify the keywords of interest related to agriculture and build models to automatically identify these keywords from audio streams. Our contribution through these techniques is a cost-efficient and effective way to monitor food security concerns particularly in rural areas. Through keyword spotting and radio talk show analysis, issues such as crop diseases, pests, drought and famine can be captured and fed into an early warning system for stakeholders and policy makers.Item Machine Learning Analysis of Radio Data to Uncover Community Perceptions on the Ebola Outbreak in Uganda(ACM Journal on Computing and Sustainable Societies, 2024-09-16) Nakatumba-Nabende, Joyce; Mukiibi, Jonathan; Bateesa, Tobius Saul; Murindanyi, Sudi; Katumba, Andrew; Mutebi, ChodrineRadio is vital for people, especially in rural areas, to share their concerns through interactive talk shows. Understanding public perceptions of pandemics is crucial because they influence people’s attitudes and health-seeking behaviors. This study used machine learning to analyze English and Luganda radio broadcast data to understand public perceptions and perspectives on the Ebola outbreak in Uganda. Our findings revealed three main speaker categories: media personalities, community guests and listeners, and government officials. The government made the most significant effort to educate the public about the Ebola outbreak. The analysis showed that the community was hesitant to use Ebola vaccines, believing that they had not been tested on other populations where the Ebola virus had originated. The community was also concerned about the effects of the lockdown measures imposed during the COVID-19 pandemic. The analysis of the radio broadcast data revealed differences in the timing and content of the conversations between male and female speakers. These experiences can inform population-specific policies for handling ongoing and future pandemics.Item Machine Translation for African Languages: Community Creation of Datasets and Models in Uganda(n African Natural Language Processing, 2022) Akera, Benjamin; Mukiibi, Jonathan; Sanyu Naggayi, Lydia; Babirye, Claire; Owomugisha, Isaac; Nsumba, Solomon; Nakatumba-Nabende, Joyce; Bainomugisha, Engineer; Mwebaze, Ernest; Quinn, JohnReliable machine translation systems are only available for a small proportion of the world’s languages, the key limitation being a shortage of training and evaluation data. We provide a case study in the creation of such resources by NLP teams who are local to the communities in which these languages are spoken. A parallel text corpus, SALT, was created for five Ugandan languages (Luganda, Runyankole, Acholi, Lugbara and Ateso) and various methods were explored to train and evaluate translation models. The resulting models were found to be effective for practical translation applications, even for those languages with no previous NLP data available, achieving mean BLEU score of 26.2 for translations to English, and 19.9 from English. The SALT dataset and models described are publicly available atItem The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition(arXiv, 2022) Mukiibi, Jonathan; Katumba, Andrew; Nakatumba-Nabende, Joyce; Hussein, Ali; Meyer, JoshBuilding a usable radio monitoring automatic speech recognition (ASR) system is a challenging task for under-resourced languages and yet this is paramount in societies where radio is the main medium of public communication and discussions. Initial efforts by the United Nations in Uganda have proved how understanding the perceptions of rural people who are excluded from social media is important in national planning. However, these efforts are being challenged by the absence of transcribed speech datasets. In this paper, The Makerere Artificial Intelligence research lab releases a Luganda radio speech corpus of 155 hours. To our knowledge, this is the first publicly available radio dataset in sub-Saharan Africa. The paper describes the development of the voice corpus and presents baseline Luganda ASR performance results using Coqui STT toolkit, an open source speech recognition toolkit.Item MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition(arXiv e-prints, 2022) Ifeoluwa Adelani, David; Neubig, Graham; Ruder, Sebastian; Rijhwani, Shruti; Nakatumba-Nabende, Joyce; Ogundepo, Odunayo; Yousuf, Oreen; Moteu Ngoli, Tatiana; Klakow, DietrichAfrican languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of stateof- the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.Item Misinformation detection in Luganda-English code-mixed social media text(. arXiv preprint arXiv, 2021) Nabende, Peter; Kabiito, David; Babirye, Claire; Tusiime, Hewitt; Nakatumba-Nabende, JoyceThe increasing occurrence, forms, and negative effects of misinformation on social media platforms has necessitated more misinformation detection tools. Currently, work is being done addressing COVID-19 misinformation however, there are no misinformation detection tools for any of the 40 distinct indigenous Ugandan languages. This paper addresses this gap by presenting basic language resources and a misinformation detection data set based on code-mixed Luganda- English messages sourced from the Facebook and Twitter social media platforms. Several machine learning methods are applied on the misinformation detection data set to develop classification models for detecting whether a code-mixed Luganda-English message contains misinformation or not. A 10-fold cross validation evaluation of the classification methods in an experimental misinformation detection task shows that a Discriminative Multinomial Na¨ıve Bayes (DMNB) method achieves the highest accuracy and F-measure of 78.19% and 77.90% respectively. Also, Support Vector Machine and Bagging ensemble classification models achieve comparable results. These results are promising since the machine learning models are based on n-gram features from only the misinformation detection data set.Item Modeling the atmospheric dispersion of SO2 from Mount Nyiragongo(Journal of African Earth Sciences, 2023) Opio, Ronald; Mugume, Isaac; Nakatumba-Nabende, Joyce; Mbogga, MichaelMount Nyiragongo, an active volcano, is the most dominant natural source of sulphur dioxide (SO2) in Africa. While a number of studies have employed atmospheric models to simulate the dispersion of SO2 from this mountain, prior to this study, no attempt has been made to use deep learning to bias correct the model’s estimates. Here, the Weather Research and Forecasting model coupled with chemistry (WRF-Chem) was used to simulate massive SO2 plumes degassed from this mountain between September 2014 and August 2015. Satellite observations by the Ozone Monitoring Instrument (OMI) showed that the SO2 spread to over 500 km from the volcano site. A deep convolutional autoencoder algorithm (WRF-DCA) was then applied to reduce the bias that WRF-Chem showed against the OMI observations. Finally, the correction performance of WRF-DCA was compared with a conventional bias correction method, linear scaling (WRF-LS). The performance of WRF-Chem, WRF-DCA, and WRF-LS was analyzed using three metrics, that is, the normalized mean bias (NMB), the root mean square error (RMSE), and Pearson’s correlation coefficient (R). The results showed that WRF-Chem overestimated SO2 at locations near the volcano site and underestimated SO2 at locations further away from the volcano site. It generated an overall average NMB of 0.61 against the OMI observations. Respectively, WRFDCA and WRF-LS reduced this bias by an average of 0.25 (40.9%) and 0.21 (34.4%). Furthermore, although both methods also reduced the RMSE and improved the correlation, WRF-DCA consistently performed better than WRF-LS. This study demonstrates the advantage that deep learning can provide in estimating volcanic SO2 emissions.Item Predicting Sweepotato Sensory Attributes Using DigiEye and Image Analysis as a Breeding Tool(RTBfoods, 2022) Nakatumba-Nabende, Joyce; Nabiryo, Ann Lisa; Babirye, Claire; Tusubira, Jeremy Francis; Katumba, Andrew; Murindanyi, Sudi; Mutegeki, Henry; Nantongo, Judith; Sserunkuma, Edwin; Nakitto, Mariam; Ssali, Reuben; Davrieux, FabriceThe objective of the work was to develop, test and evaluate a color and mealiness classification model based on images of sweetpotato roots. A total of 3018 images were collected from 950 samples from October 2021 to November 2022. The captured image data samples were harvested from several sites, including Namulonge, Arua, Bulindi, Nassari, Serere, Rwebitaba, Iganga, Kabarole, Mbale, Mpigi, Busia, Kamuli, Hoima, Kabale and Kenya. Calibrations were done using reference data collected by a sensory panel. Up to twelve cooked roots per genotype were used for sensory evaluation of traits per session. Calibrations used various linear and non-linear models. Using linear regression, high performances were observed of the calibration for orange color intensity (R2 = 0.92, Mean Squared Error (MSE) =0.58), suggesting that the model is sufficient for field application. For mealiness-by-hand and positive area, the best model has a Mean Absolute Error (MAE) of 2.16 and 9.01 respectivelyItem Scoring Root Necrosis in Cassava Using Semantic Segmentation(arXiv preprint arXiv, 2020) Tusubira, Jeremy Francis; Akera, Benjamin; Nsumba, Solomon; Nakatumba-Nabende, Joyce; Mwebaze, ErnestCassava a major food crop in many parts of Africa, has ma- jorly been a ected by Cassava Brown Streak Disease (CBSD). The dis- ease a ects tuberous roots and presents symptoms that include a yel- low/brown, dry, corky necrosis within the starch-bearing tissues. Cassava breeders currently depend on visual inspection to score necrosis in roots based on a qualitative score which is quite subjective. In this paper we present an approach to automate root necrosis scoring using deep convo- lutional neural networks with semantic segmentation. Our experiments show that the UNet model performs this task with high accuracy achiev- ing a mean Intersection over Union (IoU) of 0.90 on the test set. This method provides a means to use a quantitative measure for necrosis scor- ing on root cross-sections. This is done by segmentation and classifying the necrotized and non-necrotized pixels of cassava root cross-sections without any additional feature engineering.