IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
dc.contributor.author | Adelani, David Ifeoluwa | |
dc.contributor.author | Zhuang, Jian Yun | |
dc.contributor.author | Ochieng, Millicent | |
dc.contributor.author | Mukiibi, Jonathan | |
dc.contributor.author | Kabongo, Salomon | |
dc.contributor.author | Stenetorp, Pontus | |
dc.date.accessioned | 2025-03-11T07:27:54Z | |
dc.date.available | 2025-03-11T07:27:54Z | |
dc.date.issued | 2024-06-05 | |
dc.description.abstract | Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (\eg African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based question answering~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and six proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages~(such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Gemma 2 27B only at 63\% of the best-performing proprietary model GPT-4o performance. In addition, machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, such as Gemma 2 27B and LLaMa 3.1 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages. | |
dc.identifier.citation | Adelani, D. I., Ojo, J., Azime, I. A., Zhuang, J. Y., Alabi, J. O., He, X., ... & Stenetorp, P. (2024). Irokobench: A new benchmark for african languages in the age of large language models. arXiv preprint arXiv:2406.03368. | |
dc.identifier.other | https://doi.org/10.48550/arXiv.2406.03368 | |
dc.identifier.uri | https://nru.uncst.go.ug/handle/123456789/10104 | |
dc.language.iso | en | |
dc.publisher | arXiv preprint arXiv | |
dc.title | IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | |
dc.type | Article |