A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data

Bateesa, Tobius Saul; Babirye, Claire; Nakatumba-Nabende, Joyce

A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data

Files

A COMPARISON OF TOPIC MODELING AND CLASSIFICATION MACHINE LEARNING ALGORITHMS ON LUGANDA DATA.pdf (1.14 MB)

Date

2022

Authors

Bateesa, Tobius Saul

Babirye, Claire

Nakatumba-Nabende, Joyce

Publisher

AfricaNLP workshop

Abstract

Extracting functional themes and topics from a large text corpus manually is usually infeasible. There is a need to build text mining techniques such as topic modeling, which provide a mechanism to infer topics from a corpus of text automatically. This paper discusses topic modeling and topic classification models on Luganda text data. For topic modeling, we considered a Non-negative matrix factorization (NMF) which is an unsupervised machine learning algorithm that extracts hidden patterns from unlabeled text data to create latent topics, and for topic classification, we considered classic approaches, neural networks, and pretrained algorithms. The Bidirectional Encoder Representations from Transformers( BERT), a pretrained model that uses an attention mechanism that learns contextual relations between words (or sub-words) in a text, and a Support Vector Machine (SVM) algorithm, a classic model which analyzes particular properties of learning within text data, record the best results for topic classification. Our results indicate that topic modeling and topic classification algorithms produce relatively similar results when topic classification algorithms are trained on a balanced dataset.

Keywords

Topic modelling, Topic classification, Word embeddings, Luganda

Citation

Tobius, B. S., Babirye, C., Nakatumba-Nabende, J., & Katumba, A. (2022, March). A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data. In 3rd Workshop on African Natural Language Processing.

URI

https://openreview.net/forum?id=SYbV9qzNLZ9
https://nru.uncst.go.ug/handle/123456789/5480

Collections

Engineering and Technology

Full item page

A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

DOI

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By