A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data
Loading...
Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
AfricaNLP workshop
Abstract
Extracting functional themes and topics from a large text corpus manually is usually
infeasible. There is a need to build text mining techniques such as topic
modeling, which provide a mechanism to infer topics from a corpus of text automatically.
This paper discusses topic modeling and topic classification models
on Luganda text data. For topic modeling, we considered a Non-negative matrix
factorization (NMF) which is an unsupervised machine learning algorithm
that extracts hidden patterns from unlabeled text data to create latent topics, and
for topic classification, we considered classic approaches, neural networks, and
pretrained algorithms. The Bidirectional Encoder Representations from Transformers(
BERT), a pretrained model that uses an attention mechanism that learns
contextual relations between words (or sub-words) in a text, and a Support Vector
Machine (SVM) algorithm, a classic model which analyzes particular properties
of learning within text data, record the best results for topic classification. Our
results indicate that topic modeling and topic classification algorithms produce
relatively similar results when topic classification algorithms are trained on a balanced
dataset.
Description
Keywords
Topic modelling, Topic classification, Word embeddings, Luganda
Citation
Tobius, B. S., Babirye, C., Nakatumba-Nabende, J., & Katumba, A. (2022, March). A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data. In 3rd Workshop on African Natural Language Processing.