Sambasivam, G.Opiyo, Geoffrey Duncan2023-07-112023-07-112021Sambasivam, G. A. O. G. D., & Opiyo, G. D. (2021). A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egyptian informatics journal, 22(1), 27-34. https://doi.org/10.1016/j.eij.2020.02.007https://doi.org/10.1016/j.eij.2020.02.007https://nru.uncst.go.ug/handle/123456789/9056This work is inspired by Kaggle competition which was part of the Fine-Grained Visual Categorization workshop at CVPR 2019 (Conference on Computer Vision and Pattern Recognition) we participated in. It aimed at detecting cassava diseases using 5 fine-grained cassava leaf disease categories with 10,000, labeled images collected during a regular survey in Uganda. Traditionally, this detection is done mostly through physical inspection and supervision of cassava plants in the garden by farmers or agricultural extension workers from NAADS (National Agricultural Advisory Services) and then reported to NARO (National Agricultural Advisory Services) for further analysis. However, this can be tiresome, capital intensive, and lacks the ability to detect cassava infection timely to help farmers apply preventive techniques to the non-infected cassava plants in order to improve on yields which subsequently increases African food basket leading to food security which fights famine. Using the dataset provided to train CNNs (Convolutional Neural Networks) to achieve high accuracy was very challenging due to two reasons: the dataset was small in size and has high-class imbalance being heavily biased towards CMD (Cassava Mosaic Disease) and CBB (Cassava Brown Streak Virus Disease) classes. Class imbalance is problematic in machine learning and exists in many domains. Note that, not all world data is balanced, in fact, most of the time you will not be extremely lucky to get a perfectly balanced real-world dataset, in recent years, a lot of research has been done for two-class problems such as fraudulent credit card and tumor detection among others. Interestingly, class imbalance in multi-class image datasets has received little attention. This paper, therefore, focused on techniques to achieve an accuracy score of over 93% with class weight, SMOTE (Synthetic Minority Over-sampling Technique) and focal loss with deep convolutional neural networks from scratch. The goal was to counter high-class imbalance so that the model can accurately predict underrepresented classes.enAgricultureCassava mosaic detectionRectifier Linear UnitSynthetic minority over-sampling techniqueStochastic gradient descentA predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networksArticle