A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks

dc.contributor.authorSambasivam, G.
dc.contributor.authorOpiyo, Geoffrey Duncan
dc.date.accessioned2023-07-11T16:12:53Z
dc.date.available2023-07-11T16:12:53Z
dc.date.issued2021
dc.description.abstractThis work is inspired by Kaggle competition which was part of the Fine-Grained Visual Categorization workshop at CVPR 2019 (Conference on Computer Vision and Pattern Recognition) we participated in. It aimed at detecting cassava diseases using 5 fine-grained cassava leaf disease categories with 10,000, labeled images collected during a regular survey in Uganda. Traditionally, this detection is done mostly through physical inspection and supervision of cassava plants in the garden by farmers or agricultural extension workers from NAADS (National Agricultural Advisory Services) and then reported to NARO (National Agricultural Advisory Services) for further analysis. However, this can be tiresome, capital intensive, and lacks the ability to detect cassava infection timely to help farmers apply preventive techniques to the non-infected cassava plants in order to improve on yields which subsequently increases African food basket leading to food security which fights famine. Using the dataset provided to train CNNs (Convolutional Neural Networks) to achieve high accuracy was very challenging due to two reasons: the dataset was small in size and has high-class imbalance being heavily biased towards CMD (Cassava Mosaic Disease) and CBB (Cassava Brown Streak Virus Disease) classes. Class imbalance is problematic in machine learning and exists in many domains. Note that, not all world data is balanced, in fact, most of the time you will not be extremely lucky to get a perfectly balanced real-world dataset, in recent years, a lot of research has been done for two-class problems such as fraudulent credit card and tumor detection among others. Interestingly, class imbalance in multi-class image datasets has received little attention. This paper, therefore, focused on techniques to achieve an accuracy score of over 93% with class weight, SMOTE (Synthetic Minority Over-sampling Technique) and focal loss with deep convolutional neural networks from scratch. The goal was to counter high-class imbalance so that the model can accurately predict underrepresented classes.en_US
dc.identifier.citationSambasivam, G. A. O. G. D., & Opiyo, G. D. (2021). A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egyptian informatics journal, 22(1), 27-34. https://doi.org/10.1016/j.eij.2020.02.007en_US
dc.identifier.urihttps://doi.org/10.1016/j.eij.2020.02.007
dc.identifier.urihttps://nru.uncst.go.ug/handle/123456789/9056
dc.language.isoenen_US
dc.publisherEgyptian informatics journalen_US
dc.subjectAgricultureen_US
dc.subjectCassava mosaic detectionen_US
dc.subjectRectifier Linear Uniten_US
dc.subjectSynthetic minority over-sampling techniqueen_US
dc.subjectStochastic gradient descenten_US
dc.titleA predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networksen_US
dc.typeArticleen_US
Files