Browsing by Author "Babirye, Claire"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data(AfricaNLP workshop, 2022) Bateesa, Tobius Saul; Babirye, Claire; Nakatumba-Nabende, JoyceExtracting functional themes and topics from a large text corpus manually is usually infeasible. There is a need to build text mining techniques such as topic modeling, which provide a mechanism to infer topics from a corpus of text automatically. This paper discusses topic modeling and topic classification models on Luganda text data. For topic modeling, we considered a Non-negative matrix factorization (NMF) which is an unsupervised machine learning algorithm that extracts hidden patterns from unlabeled text data to create latent topics, and for topic classification, we considered classic approaches, neural networks, and pretrained algorithms. The Bidirectional Encoder Representations from Transformers( BERT), a pretrained model that uses an attention mechanism that learns contextual relations between words (or sub-words) in a text, and a Support Vector Machine (SVM) algorithm, a classic model which analyzes particular properties of learning within text data, record the best results for topic classification. Our results indicate that topic modeling and topic classification algorithms produce relatively similar results when topic classification algorithms are trained on a balanced dataset.Item A dataset of cassava whitefly count images(Data in Brief, 2022) Nakatumba-Nabende, Joyce; Tusubira, Jeremy Francis; Babirye, Claire; Nsumba, Solomon; Omongo Abu, ChristopherWhiteflies are insect vectors that affect a variety of plants such as tomatoes, cabbages, sweet potatoes, eggplants, and cassava. In Uganda, whiteflies are a major contributor to the spread of Cassava Brown Streak Disease (CBSD). By suckling on infected cassava plants, whiteflies can potentially transfer the Cassava Brown Streak Virus that causes CBSD to unin- fected clean plants nearby when they migrate. When they attack the cassava plants in large numbers, whiteflies can also cause significant physical damage through suckling. This eventually can lead to leaf loss or plant death. Whiteflies also excrete “honeydew”, which harbors a fungus known as “sooty mold”that covers the leaves, limiting access to sun- light which in turn affects plant food production. As part of their work, the cassava breeders often conduct studies to as- sess the population of whiteflies in cassava fields through a manual process of visual inspection which can be arduous and time-consuming. This paper presents a cassava whitefly dataset that has been curated to enable researchers to build solutions for the automation of the count and detection of whiteflies. The dataset contains 3,0 0 0 images captured in a whitefly trial site in Uganda. It depicts different variations of whitefly infestation from low to high infestation. This data has already been used to provide a proof-of-concept solution for whitefly counting based on Machine Learning approaches.Item Machine Translation for African Languages: Community Creation of Datasets and Models in Uganda(n African Natural Language Processing, 2022) Akera, Benjamin; Mukiibi, Jonathan; Sanyu Naggayi, Lydia; Babirye, Claire; Owomugisha, Isaac; Nsumba, Solomon; Nakatumba-Nabende, Joyce; Bainomugisha, Engineer; Mwebaze, Ernest; Quinn, JohnReliable machine translation systems are only available for a small proportion of the world’s languages, the key limitation being a shortage of training and evaluation data. We provide a case study in the creation of such resources by NLP teams who are local to the communities in which these languages are spoken. A parallel text corpus, SALT, was created for five Ugandan languages (Luganda, Runyankole, Acholi, Lugbara and Ateso) and various methods were explored to train and evaluate translation models. The resulting models were found to be effective for practical translation applications, even for those languages with no previous NLP data available, achieving mean BLEU score of 26.2 for translations to English, and 19.9 from English. The SALT dataset and models described are publicly available atItem Misinformation detection in Luganda-English code-mixed social media text(. arXiv preprint arXiv, 2021) Nabende, Peter; Kabiito, David; Babirye, Claire; Tusiime, Hewitt; Nakatumba-Nabende, JoyceThe increasing occurrence, forms, and negative effects of misinformation on social media platforms has necessitated more misinformation detection tools. Currently, work is being done addressing COVID-19 misinformation however, there are no misinformation detection tools for any of the 40 distinct indigenous Ugandan languages. This paper addresses this gap by presenting basic language resources and a misinformation detection data set based on code-mixed Luganda- English messages sourced from the Facebook and Twitter social media platforms. Several machine learning methods are applied on the misinformation detection data set to develop classification models for detecting whether a code-mixed Luganda-English message contains misinformation or not. A 10-fold cross validation evaluation of the classification methods in an experimental misinformation detection task shows that a Discriminative Multinomial Na¨ıve Bayes (DMNB) method achieves the highest accuracy and F-measure of 78.19% and 77.90% respectively. Also, Support Vector Machine and Bagging ensemble classification models achieve comparable results. These results are promising since the machine learning models are based on n-gram features from only the misinformation detection data set.Item Predicting Sweepotato Sensory Attributes Using DigiEye and Image Analysis as a Breeding Tool(RTBfoods, 2022) Nakatumba-Nabende, Joyce; Nabiryo, Ann Lisa; Babirye, Claire; Tusubira, Jeremy Francis; Katumba, Andrew; Murindanyi, Sudi; Mutegeki, Henry; Nantongo, Judith; Sserunkuma, Edwin; Nakitto, Mariam; Ssali, Reuben; Davrieux, FabriceThe objective of the work was to develop, test and evaluate a color and mealiness classification model based on images of sweetpotato roots. A total of 3018 images were collected from 950 samples from October 2021 to November 2022. The captured image data samples were harvested from several sites, including Namulonge, Arua, Bulindi, Nassari, Serere, Rwebitaba, Iganga, Kabarole, Mbale, Mpigi, Busia, Kamuli, Hoima, Kabale and Kenya. Calibrations were done using reference data collected by a sensory panel. Up to twelve cooked roots per genotype were used for sensory evaluation of traits per session. Calibrations used various linear and non-linear models. Using linear regression, high performances were observed of the calibration for orange color intensity (R2 = 0.92, Mean Squared Error (MSE) =0.58), suggesting that the model is sufficient for field application. For mealiness-by-hand and positive area, the best model has a Mean Absolute Error (MAE) of 2.16 and 9.01 respectively