A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data

dc.contributor.authorBateesa, Tobius Saul
dc.contributor.authorBabirye, Claire
dc.contributor.authorNakatumba-Nabende, Joyce
dc.date.accessioned2022-11-27T16:35:39Z
dc.date.available2022-11-27T16:35:39Z
dc.date.issued2022
dc.description.abstractExtracting functional themes and topics from a large text corpus manually is usually infeasible. There is a need to build text mining techniques such as topic modeling, which provide a mechanism to infer topics from a corpus of text automatically. This paper discusses topic modeling and topic classification models on Luganda text data. For topic modeling, we considered a Non-negative matrix factorization (NMF) which is an unsupervised machine learning algorithm that extracts hidden patterns from unlabeled text data to create latent topics, and for topic classification, we considered classic approaches, neural networks, and pretrained algorithms. The Bidirectional Encoder Representations from Transformers( BERT), a pretrained model that uses an attention mechanism that learns contextual relations between words (or sub-words) in a text, and a Support Vector Machine (SVM) algorithm, a classic model which analyzes particular properties of learning within text data, record the best results for topic classification. Our results indicate that topic modeling and topic classification algorithms produce relatively similar results when topic classification algorithms are trained on a balanced dataset.en_US
dc.identifier.citationTobius, B. S., Babirye, C., Nakatumba-Nabende, J., & Katumba, A. (2022, March). A Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Data. In 3rd Workshop on African Natural Language Processing.en_US
dc.identifier.urihttps://openreview.net/forum?id=SYbV9qzNLZ9
dc.identifier.urihttps://nru.uncst.go.ug/handle/123456789/5480
dc.language.isoenen_US
dc.publisherAfricaNLP workshopen_US
dc.subjectTopic modellingen_US
dc.subjectTopic classificationen_US
dc.subjectWord embeddingsen_US
dc.subjectLugandaen_US
dc.titleA Comparison of Topic Modeling and Classification Machine Learning Algorithms on Luganda Dataen_US
dc.typeOtheren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
A COMPARISON OF TOPIC MODELING AND CLASSIFICATION MACHINE LEARNING ALGORITHMS ON LUGANDA DATA.pdf
Size:
1.14 MB
Format:
Adobe Portable Document Format
Description:
Conference Paper
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: