Browsing by Author "Nabende, Peter"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Building Text and Speech Benchmark Datasets and Models for Low-Resourced East African Languages: Experiences and Lessons(Applied AI Letters,, 2025-03-26) Nakatumba-Nabende, Joyce; Nabende, Peter; Mukiibi, Jonathan; Mutebi, Chodrine; Katumba, AndrewAfrica has over 2000 languages; however, those languages are not well represented in the existing natural language processing ecosystem. African languages lack essential digital resources to effectively engage in advancing language technologies. There is a need to generate high-quality natural language processing resources for low-resourced African languages. Obtaining high-quality speech and text data is expensive and tedious because it can involve manual sourcing and verification of data sources. This paper discusses the process taken to curate and annotate text and speech datasets for five East African languages: Luganda, Runyankore-Rukiga, Acholi, Lumasaba, and Swahili. We also present results obtained from baseline models for machine translation, topic modeling and classification, sentiment classification, and automatic speech recognition tasks. Finally, we discuss the experiences, challenges, and lessons learned in creating the text and speech datasets.Item MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages(arXiv preprint arXiv, 2023-05-23) Dione, Cheikh M. Bamba; Nabende, Peter; Mukiibi, Jonathan; Chinedu Uchechukwu; Uchechukwu, Chinedu; Abdullahi, Muhammad; Klakow, DietrichIn this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pretrained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the Masakha- POS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.Item Misinformation detection in Luganda-English code-mixed social media text(. arXiv preprint arXiv, 2021) Nabende, Peter; Kabiito, David; Babirye, Claire; Tusiime, Hewitt; Nakatumba-Nabende, JoyceThe increasing occurrence, forms, and negative effects of misinformation on social media platforms has necessitated more misinformation detection tools. Currently, work is being done addressing COVID-19 misinformation however, there are no misinformation detection tools for any of the 40 distinct indigenous Ugandan languages. This paper addresses this gap by presenting basic language resources and a misinformation detection data set based on code-mixed Luganda- English messages sourced from the Facebook and Twitter social media platforms. Several machine learning methods are applied on the misinformation detection data set to develop classification models for detecting whether a code-mixed Luganda-English message contains misinformation or not. A 10-fold cross validation evaluation of the classification methods in an experimental misinformation detection task shows that a Discriminative Multinomial Na¨ıve Bayes (DMNB) method achieves the highest accuracy and F-measure of 78.19% and 77.90% respectively. Also, Support Vector Machine and Bagging ensemble classification models achieve comparable results. These results are promising since the machine learning models are based on n-gram features from only the misinformation detection data set.Item On the Goodness of Fit of Parametric and Non‑Parametric Data Mining Techniques: The Case of Malaria Incidence Thresholds in Uganda(Health and Technology, 2021) Bbosa, Francis Fuller; Nabukenya, Josephine; Nabende, Peter; Wesonga, RonaldTo identify which data mining technique (parametric or non-parametric) best fits the predictions on imbalanced malaria incidence dataset. The researchers compared parametric techniques in form of naïve Bayes and logistic regression against non-parametric techniques in form of support vector machines and artificial neural networks and their goodness of fit and prediction was assessed using 10-fold and 5-fold cross-validation on an independent validation dataset set to determine which model best fits the predictions on imbalanced malaria incidence dataset. The 10-fold cross-validation outperformed the 5-fold cross-validation in all performance metrics with the naïve Bayes classifier attaining accuracy of 69% with a sensitivity of 90.9%, a specificity of 55.6%, a precision of 55.6% and F-measure score of 69.0%, the logistic regression achieved an accuracy of 65.5% with a sensitivity of 83.3%, a specificity of 52.9%, a precision of 55.6% and F-measure score of 66.7%, the support vector machines achieved an accuracy of 82.8% with a sensitivity of 88.2%, a specificity of 75.0%, a precision of 83.3%, and F-measure score of 85.7% whereas the artificial neural networks registered an accuracy of 89.7% with a sensitivity of 94.1%, a specificity of 83.3%, a precision of 88.9%, and F-measure score of 91.4%. Non-parametric data mining techniques in form of artificial neural networks and support vector machines outperformed the parametric data mining technique in form of naïve Bayes in making predictions emanating from imbalanced malaria incidence dataset on account of registering higher F-measure values of 91.4% and 85.7% respectively.Item Ontology Driven Machine learning Approach for Disease Name Extraction from Twitter Messages(IEEE, 2017) Mwebaze, Ernest; Nabende, Peter; Magumba, Mark AbrahamTwitter and social media as a whole has great potential as a source of disease surveillance data however the general messiness of tweets presents several challenges for standard information extraction methods. Current methods for disease surveillance on twitter rely on inflexible keyword based approaches that require messages to be pre-filtered on the basis of a disease name which is supplied a priori and are not capable of detecting new ailments. In this paper we present an ontology based machine learning approach to extract disease names and expressions describing ailments from tweets which may be employed as part of a larger general purpose system for automated disease incidence monitoring. We also propose a simple methodology for automatic detection and correction of errors.Item Reliability of Predictions Using Hybrid Models: The Case of Malaria Incidence Rates in Uganda(Journal of Health Informatics in Africa, 2020) Nabende, Peter; Bbosa, Francis Fuller; Wensonga, Ronald; Nabukenya, JosephineBackground and purpose: Reliability of estimates emanating from predictive independent data mining techniques is a complex problem. This could be attributed to cross-cutting weaknesses of individual techniques such as collinearity due to high dimensionality of attributes in a dataset, biasedness due to under fitting and over fitting of data as well as noise accumulation due to outliers and thus affecting the reliability of predictions emanating from these models. This study thus aimed at developing a hybrid data mining technique for predicting reliable malaria incidence rate thresholds. Methods: The decision tree and naïve Bayes classifiers were used to build a hybrid prediction model. Results of the developed hybrid model were compared with independent data mining models using 10- fold cross-validation on a previously unlearned data set. Accuracy, F-measure and the area under the receiver operating characteristics curve (AUC) were the key performance metrics used to evaluate the generalizability of the hybrid model in comparison to the independent models. Results: Findings revealed that the hybrid classifier attained an accuracy of 79.3% and an F-measure score of 84.2%, the naïve Bayes classifier achieved accuracy and F-measure value of 69% while the decision tree classifier registered an accuracy of 72.4% and an F-measure score of 80%. Conclusions: The developed hybrid model outperformed both independent decision tree and naïve Bayes models. Hence merging several independent homogeneous predictive data mining techniques enhances the accuracy of the estimates leading to reliable estimates.Item The Institutionalisation of Information Security Management Practices in selected Organisations in Uganda(International Journal of Advanced Research, 2023) Ahimbisibwe, Benjamin K.; Nabende, Peterhe study aimed at examining the extent to which information security management practices were institutionalised in corporate organisations. Evidence shows that failure by organisations to entrench the information security management practices (ISMPs) into organisations’ structures opens the gateway for attacks, threat actors and information breaches to cause harm to information assets with ease. The study explored the phenomenon in its social setting hence the adoption of descriptive research design as the research methodology. The institutional theory was adopted as a new dimension in examining information security management in organisations. This theory suggests that control gears like coercive, normative, mimetic and management commitment could be used to effectively entrench security guidelines in organisations. Methodical scrutiny of the institutionalisation process: development, implementation and maintenance, and evaluation were also carried out. The researcher relied on human experience to make sense of the institutionalised processes. Extant literature was reviewed, and survey questionnaires were developed based on the eleven ISMPs and administered to purposively selected respondents from the two organisations. The eleven ISMPs covered include state of information security policy, asset management, secure information sharing, supply chain security, access management, network security controls, portable and removable media security, remote access security, protective monitoring of information systems, implementation of information security back-ups, and security accreditation by professional bodies. Data analysis was done using SPSS. Findings indicate that organisations have not fully incorporated all the eleven ISMPs covered as best practices and standards. Based on the results from the field, answers to the research questions were partly realised. Recommendations like the implementation of ISMPs to check deficiencies identified, customisation of security guidelines to protect information assets and institutionalisation of security practices at all levels were suggested. Overall, the study was a positive step towards the institutionalisation process of ISMPs in organisationsItem Towards Computational Resource Grammars for Runyankore and Rukiga(European Language Resources Association (ELRA), 2020) Nabende, Peter; Bamutura, David; Ljunglöf, PeterIn this paper, we present computational resource grammars of Runyankore and Rukiga (R&R) languages. Runyankore and Rukiga are two under-resourced Bantu Languages spoken by about 6 million people indigenous to South Western Uganda, East Africa. We used grammatical Framework (GF), a multilingual grammar formalism and a special-purpose functional programming language to formalise the descriptive grammar of these languages. To the best of our knowledge, these computational resource grammars are the first attempt to the creation of language resources for R&R. In Future Work, we plan to use these grammars to bootstrap the generation of other linguistic resources such as multilingual corpora that make use of data-driven approaches to natural language processing feasible. In the meantime, they can be used to build Computer-Assisted Language Learning (CALL) applications for these languages among others.Item Towards Web-based Productivity Analysis and Reporting(2020) Mwebaza, Michael; Mnzava, Emmanuel; Nakatumba, Joyce; Nabende, PeterIn this paper, we propose a Web-based Productivity Analysis and Reporting Tool (W-PART) for applications that necessitate gathering productivity data from remote sites for a given business entity. W-PART is aimed at reducing on productivity data entry and analysis workload from a central input point. Other beneits that we expect to realize from the tool include reduction of data loss and time taken entering productivity data. The tool is generally aimed for utilization by any given organization in a ‘developing area’, but relies on availability of the Internet in any form. We report on the current status in the development of the tool and discuss its software implementation prospects. For the purpose of discussing different aspects concerning the tool, we use banks as a case study. We avail general implementation requirements for the tool which we expect should lead to a variety of options for implemention.Item Using Data Analytics to Strengthen Monitoring and Surveillance of Routine Immunization Coverage for Children under One Year in Uganda(In HEALTHINF, 2021) Nantongo, Bartha Alexandra; Nabukenya, Josephine; Nabende, PeterImmunization coverage is a traditional key performance indicator that enables stakeholders to monitor child health, investigate gaps, and take remedial actions. It is continuously challenged by validity due to the neglect of unstructured data and process indicators that track small changes/milestones. While empirical evidence indicates digitalized immunization systems establish coverage from structured data, renowned administrative and household survey estimates are often inaccurate/untimely. Government instituted awareness, accessibility, and results-based performance approaches, but stakeholders are challenged by accurate monitoring of performance against Global Vaccination Action Plan coverage targets. This heightens inappropriate strategy implementation leading to persistent low coverage and declining trends. There is scanty literature substantiating the essence of comprehensive immunization indicators in monitoring evidence-based and timely interventions. For this reason, health workers failed to appreciate immunization process indicators and monitoring role. The study aims at developing a real-time immunization coverage monitoring framework that supports evidence-based strategy implementation using prescriptive analytics. The envisaged artifact analyzes a variety of data and monitors immunization performance against comprehensive indicators. It is a less resource-demanding strategy that prompts accurate and real-time insights to support intervention implementation decisions. This study will follow an explanatory research approach by first collecting quantitative data and later qualitative for in-depth analysis.