R E S E A R C H Open Access © The Author(s) 2026. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit ​h​t​t​p​:​/​​/​c​r​e​a​​t​i​v​e​c​o​​m​m​o​n​​s​.​o​r​g​​/​l​i​c​e​​n​s​ e​s​/​b​​y​-​n​c​​-​n​d​/​4​.​0​/. Taremwa et al. Discover Artificial Intelligence (2026) 6:164 https://doi.org/10.1007/s44163-026-00855-7 *Correspondence: Danison Taremwa taremwa.danison@gmail.com 1Department of Computer Science, Kyambogo University, Kampala, Uganda 2Department of Networks, Data Science and Artificial Intelligence, Kyambogo University, Kampala, Uganda 3Department of Software Engineering, Mbarara University of Science & Technology, Mbarara, Uganda 4Department of Environmental Science, Kyambogo University, Kampala, Uganda 5Department of Computer Science, Mbarara University of Science & Technology, Mbarara, Uganda Prediction of maize yield in Uganda using CNN- LSTM architecture on a multimodal climate and remote sensing dataset Danison Taremwa1,5*, Emmanuel Ahishakiye2, Aggrey Obbo3, Paul Kategaya Kisozi4 and Fred Kaggwa5 Discover Artificial Intelligence Abstract Maize is a staple crop in Uganda, underpinning both food security and rural livelihoods. Accurate forecasting of maize yields is therefore crucial for guiding agricultural planning, resource allocation, and policy design. Yet traditional statistical methods are often limited by low accuracy, poor scalability, and weak integration of diverse inputs, leaving them unable to capture complex, nonlinear, and spatiotemporal dynamics of crop growth. To overcome these constraints, we developed a hybrid convolutional neural network and long short-term memory (CNN-LSTM) model. This model integrates remotely sensed climatic variables and vegetation indices with biannual maize yield records from Uganda’s Zonal Agricultural Research and Development Institute (ZARDI) zones for the period 2018–2020. Due to the scarcity of high-quality yield data, we applied the Synthetic Minority Oversampling Technique for Regression (SMOGN) alongside feature selection to balance the dataset and improve predictive robustness. The CNN-LSTM model’s ability to select features and perform extensive hyperparameter tuning enabled it to outperform baseline models. It achieved a Mean Squared Error (MSE) of 0.107 tonnes2, a Mean Absolute Error (MAE) of 0.267 tonnes, a Root Mean Squared Error (RMSE) of 0.327 tonnes, and an R2 score of 0.783. A comparative analysis revealed that the CNN + Random Forest (RF) model achieved an MSE of 0.137 tonnes2, a MAE of 0.281 tonnes, an RMSE of 0.370 tonnes, and an R2 score of 0.722. These results outperformed the standalone CNN (MSE = 0.216, R2 = 0.562) and RF (MSE = 0.211, R2 = 0.573) models, underscoring the advantage of combining spatial–temporal learning for improved predictive accuracy. Residual analysis further confirmed the model's stability, showing minimal bias and close agreement between observed and predicted yields. These findings highlight the potential for integrating spatial– temporal deep learning and ensemble methods to deliver accurate crop yield forecasts in data-limited smallholder systems. By offering a scalable framework for evidence-based farm planning and food security policy, our study demonstrated that advanced machine learning can directly support sustainable development in sub- Saharan Africa. Future research will extend the framework to incorporate Transformer architectures, high-resolution satellite imagery, and explainable AI, further enhancing accuracy, interpretability, and decision-support capacity. http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ https://doi.org/10.1007/s44163-026-00855-7 http://crossmark.crossref.org/dialog/?doi=10.1007/s44163-026-00855-7&domain=pdf&date_stamp=2026-1-25 Page 2 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 1  Introduction Crop yield prediction (CYP) provides an estimate of yield per unit area [1]. Yield pre- diction is a crucial strategy for farmers and the agricultural sector to efficiently man- age resources and make informed decisions during crop growth and harvesting [2]. Being informed of the yield before harvest aids resource planning, such as determining the optimal fertilizer use to achieve a high yield, scheduling labour, and preparing for packaging and manufacturing [3]. For instance, breweries rely on timely maize yield esti- mates to adjust procurement and production, demonstrating the industrial importance of accurate forecasts [4]. In Sub-Saharan African countries (SSA), such as Uganda, over 60% of the population relies on maize for food and as a source of household income. Therefore, timely yield information is vital to safeguard food security and strengthen rural resilience [5]. However, yield forecasts are often unavailable before harvest, lead- ing to price volatility that heightens economic uncertainty for farmers and policymakers [6]. These vulnerabilities are likely to worsen as climate variability increases and food demand rises, underscoring the need for accurate, flexible, and scalable yield forecasting systems as essential tools for mitigating risk and enhancing resilience [7]. Yield prediction remains challenging because crop productivity depends on complex nonlinear interactions among genotype, environment, and management factors that vary spatially and temporally [8]. Traditionally, estimating maize yield in Uganda has relied on farmers' intuition, statistical regression methods, and crop simulation models [9, 10]. While informative and interpretable, these approaches are subjective, complex to transfer, and limited in their ability to capture these nonlinearities [11]. This may result in inaccuracies in maize yield information, which, in turn, could misinform decision- making along the supply chain and ultimately exacerbate food shortages [12]. Recent research emphasizes the need for improved accuracy, higher spatial resolution, and effi- cient algorithms that can learn from large, multimodal datasets to capture yield dynam- ics more reliably [13]. Satellite remote sensing and ground-based approaches have been found to complement each other in yield prediction. Together, they provide a scalable method for deriving timely and accessible observations of vegetation and climate over Article highlights • Developed a hybrid CNN-LSTM model that integrates remotely sensed climatic and vegetation indices to predict maize yields across Uganda’s ZARDI zones (2018–2020). • Achieved high predictive accuracy (MSE = 0.107, explaining 78% of yield variation), outperforming standalone CNN, ensemble models such as RF, and CNN-RF. • Introduced SMOGN-based data augmentation and feature selection techniques to overcome data sparsity, a novel approach for yield forecasting in smallholder, data-limited systems. • Demonstrated that hybrid DL frameworks can inform scalable, data-driven agricultural planning, with potential to guide policymakers and strengthen food security strategies in SSA. • Future work will focus on integrating Transformer architectures for improved sequence modelling alongside high-resolution imagery and explainable AI to enhance accuracy, interpretability, and practical decision support. Keywords  Maize yield prediction, Ensemble learning, Precision agriculture, CNN-LSTM, Vegetation indices Page 3 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 large areas [9, 14]. However, extracting and analysing actionable insights from high- dimensional imagery remains technically challenging for farmers and agricultural pro- fessionals [15]. Globally, machine learning (ML) and deep learning (DL) techniques have advanced CYP by uncovering complex, nonlinear patterns in remotely sensed data, resulting in accurate outputs [16, 17]. Machine learning models, such as Random Forest (RF), Sup- port Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost), have shown strong performance but often rely on hand-engineered features. They can be prone to overfitting and may face efficiency and scalability issues as feature dimensionality and data complexity increase [18, 19]. These challenges become especially evident in agri- cultural applications, where crop growth dynamics are naturally represented as a time series. Each time point represents the crop’s physiological status at a specific develop- mental stage across diverse environments. Hence, the spatial–temporal nature of crop development necessitates advanced DL architectures, such as CNNs and LSTM net- works, which can jointly capture sequential dependencies and spatial variability, thereby enhancing predictive performance [20]. The CNN component extracts localized vegeta- tion patterns from satellite images, reflecting spatial variation, while the LSTM network learns long-term temporal dependencies across the growing season to support pheno- logical analysis [21]. This synergy enables more accurate modeling of spatial–temporal crop yield dynamics, enhancing the accuracy of CYP [17]. Several studies have demonstrated this integrated approach, particularly in maize yield prediction [16, 17, 22]. For instance, studies [23] and [22] applied a CNN-LSTM model to predict maize yields in the US and China. Study [22] utilized high-resolution Unmanned Aerial Vehicle (UAV)-based multispectral and Light Detection and Rang- ing (LIDAR) data, with an attention mechanism that enhanced both interpretability and model robustness. In contrast, [23] employed MODIS imagery, meteorological time series, and soil attributes. Both studies achieved precise results, with R2 values of 0.73 and 0.78, respectively, demonstrating the effectiveness of the CNN-LSTM architecture in capturing complex spatiotemporal and nonlinear interactions across different agro- ecological contexts. Together, these outcomes provide compelling evidence for the broader applicability of ensemble and hybrid DL models, which have already been successfully deployed in CYP [21, 24], underscoring the power of crop forecasting. While these successes motivate their application to the maize yield problem in Uganda, the use of such models utiliz- ing remotely sensed data in smallholder intercropped farms in Uganda, where datas- ets are limited and spatial variability is high, remains largely underexplored. Moreover, models trained in high-input, homogeneous farming systems, such as the US Corn Belt, where abundant, reliable training data are available, often exhibit reduced performance in diverse agro-ecological zones, such as those found in Uganda. In these settings, domain shifts markedly diminish predictive accuracy, underscoring the need for model adaptation. This limitation is further exacerbated by the scarcity of high-quality long- term maize yield records across ZARDI zones [25, 26]. While acquiring large amounts of vegetation indices (VIs), soil characteristics, and climatic data can be relatively easy, collecting yield data in developing countries is costly and sporadic, relying heavily on smallholder farmers and typically producing only short-term time series for model development. Page 4 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 For example, datasets collected from the Uganda Bureau of Statistics (UBOS) between 2018 and 2020 were limited, incomplete, and spatially fragmented, resulting in imbal- anced data that would constrain model training and evaluation [27, 28]. Such limita- tions hinder the learning process of DL models, often leading to biased predictions and reduced reliability. These challenges underscore the need for DL models that are locally calibrated, scalable, and accurate for maize yield prediction in Uganda's data-scarce, intercropped farming systems. In this study, we leverage the only consistent yield obser- vation available and address this limitation by augmenting it with SMOGN, a regres- sion-aware method that balances continuous outcomes. This approach enhances data diversity, mitigates class imbalance, and improves model robustness without distorting the underlying distributions. Consequently, the combined use of synthetic and original data broadens the training sample space, enhancing the efficiency, reliability, and gener- alizability of DL models across ZARDI zones with heterogeneous spatial and yield char- acteristics [29, 30]. To ensure robustness across dataset sizes, we deliberately employed a compact archi- tecture, using a 1D-CNN and a single LSTM layer. The CNN extracts localized patterns from feature vectors at each time step, serving as adaptive feature detectors well-suited to tabular input. At the same time, the LSTM model captures temporal dependencies across seasons. This hybrid CNN-LSTM model was strengthened through synthetic data augmentation, significantly improving maize yield prediction in Uganda and providing a scalable framework for evidence-based decision-making in smallholders' agricultural systems. Hence, our specific objectives are: (i) to integrate multi-source data to compen- sate for the limited labels, (ii) apply SMOGN to mitigate imbalance in yield distribution, and (iii) develop and validate a CNN-LSTM model for maize yield prediction in Uganda. Accordingly, the study's contributions can be highlighted in the three aspects below: i. Multi-source data integration: we harmonized satellite VIs, climate variables, soil properties, and zonal yield records, covering Uganda’s nine ZARDIs (2018–2020), creating a comprehensive dataset for model development. ii. Applied SMOGN data augmentation methods to overcome sparsity, incompleteness, and imbalances in maize yield datasets, reducing skewness while preserving data distributions and improving predictive accuracy across diverse ZARDI zones. iii. Developed and evaluated a CNN-LSTM model for data-scarce contexts, integrating synthetic augmentation with remote sensing and environmental data to capture intricate nonlinear spatial–temporal dependencies. The model, with hyperparameters optimized via grid search and cross-validation, effectively learns spatial and temporal patterns and outperforms CNN and CNN-Random Forest. Together, these contributions deliver a resilient DL model for maize yield forecasting in Uganda’s smallholder intercropped systems. The approach provides both methodologi- cal innovation and offers practical decision-support value for data-scarce agricultural regions, with strong potential for extension to other crops and diverse agro-ecological contexts across SSA. Page 5 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 2  Related literature 2.1  Transformative impact of artificial intelligence and remote sensing in agriculture Artificial intelligence (AI) and remote sensing are transforming agriculture by provid- ing scalable, timely, and cost-effective methods for monitoring crop growth under the dual pressures of climate change and rising food demand [31]. Although adoption has advanced rapidly in developed countries, implementation across the SSA remains lim- ited, particularly among small-scale farmers who contribute 70% of the production. Enhancing crop production, monitoring diseases, managing fertilizer and irrigation, and optimizing harvesting are all challenging tasks that contribute to improving food secu- rity [3]. To meet these demands, remote sensing platforms, ranging from ground-based sensors (e.g., Internet of Things devices) to UAVs and satellites, provide vital information on crop growth, soil health, and climate variability [32]. Among these, satellite-based multispectral sensors such as Sentinel-2, Landsat, and MODIS are particularly valuable for large-scale monitoring, as they offer high resolution and continuous coverage at rel- atively low cost [33]. From these sensors, VIs including NDVI, EVI, LAI, NDWI, and CCI serve as robust indicators of crop vigor, water status, and chlorophyll content. At the same time, climatic and soil variables capture complementary environmental factors that influence yield. Integrating these multimodal datasets has consistently been demon- strated to improve yield prediction accuracy, as each source offers the limitations of the others [34, 35]. Consequently, predicting future crop yields from diverse data sources is a complex task that requires hybrid models capable of capturing nonlinear spatial patterns and temporal dependencies, thereby enhancing prediction accuracy [23]. 2.2  Deep learning and multisource datasets for crop yield prediction Various studies have used multimodal datasets to achieve accurate, reliable results by combining multiple sources of information. For example, a study by [36] developed a Bayesian optimization-based LSTM (BO-LSTM) model to predict winter wheat yields, which outperformed conventional ML models such as SVM and LASSO, achieving an RMSE of 177.84 kg/ha and an R2 of 0.82. Their comparative experiments demonstrated that using single inputs, such as GPP (R2 = 0.72, RMSE = 186.13  kg/ha), LAI (R2 = 0.67, RMSE = 221.32 kg/ha), and VIs (R2 = 0.78, RMSE = 190.96 kg/ha), yielded lower perfor- mance. Combining meteorological data with GPP significantly improved performance (R2 = 0.81; RMSE = 180.66  kg/ha). This highlights the strength of multimodal fusion in capturing crop-environmental interactions. While multimodal integration offers the highest accuracy, temporal models also show strong potential. Study [20] demonstrated that even small sets of NDVI time-series inputs can perform well when trained with LSTM, achieving an RMSE of 505.78  kg/ ha (NRMSE = 0.0726). This underscores LSTM’s strength in learning growth dynam- ics directly from sequential data, maintaining reasonable accuracy even under limited input conditions. Building on this, [23] developed a multilevel CNN-LSTM model that incorporated time-series remote sensing data and soil properties to predict corn yield across the U.S. Corn Belt states (2013 to 2016). The hybrid model achieved an RMSE of 1010.61, a MAPE of 7.97%, and an R2 of 0.75, outperforming both traditional ML and standalone DL models, including the Deep Neural Network (DNN) model (RMSE of 1130.44, MAPE of 9.14%, and R2 of 0.68). However, they observed that improvements Page 6 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 from additional modalities plateaued, with phenological features proving more influen- tial than climate variables. More studies have focused on an ensemble learning approach with DL models. For example, [21] developed a stacked CNN-DNN ensemble with LASSO meta-learning, achieving the best performance across seasons in the U.S. Corn Belt (RMSE = 874  kg/ ha, R2 = 83.1% in 2019, and RMSE = 999 kg/ha, R2 = 74.6% in 2020). By comparison, the baseline CNN-RNN model without ensemble improvements exhibited high error rates (RMSE of 1,007 kg/ha in 2019 and 1,092 kg/ha in 2020). Similarly, [37] employed a con- catenated-based 2D-CNN-BILSTM that fused Sentinel-1, Sentinel-2, and soil grids to predict corn yield in Iowa State (2018 to 2021). Their model achieved an RMSE of 0.698 tonnes per hectare, a MAPE of 4.4%, and an index of agreement (D) value of 84.67%, clearly surpassing the baseline model, 2D-CNN, which had a D of 14.77%. Both stud- ies demonstrate the advantages of ensemble-hybrid pipelines in improving robustness and reliability. Finally, expanding the geographical focus, [17] employed a CNN-LSTM- Attention model to predict yields of maize, rice, and soybeans in Northeast China. The model achieved an R2 of 0.80, an RMSE of 375.08, and a MAPE of 4.21%, outperforming both CNN (MAPE of 4.30%, R2 of 0.77, and RMSE of 394.67) and LSTM. The inclusion of attention mechanisms enabled the extraction of higher-order features, addressing the complexity of high-dimensional agricultural data across time and space. Taken together, these studies demonstrate that crop yield estimation has primarily been conducted in developed countries, where DL models such as CNNs and LSTMs have been applied to remote sensing data, achieving high accuracy [23, 36]. In these con- texts, the accuracy, efficiency, and long-term usefulness of maize yield information have significantly improved, reflecting the advantages of homogeneous farming systems and well-documented agricultural practices. However, these conditions are difficult to repli- cate in the widely heterogeneous cropping patterns of Africa, which are characterized by smallholder-dominated systems [5]. Moreover, DL models often exhibit limited spatial transferability and remain highly location-specific due to domain shifts across diverse environments [25]. A further challenge DL faces is the need for extensive training data- sets, which are difficult to acquire in developing countries, thereby constraining models' generalization in small data domains [37]. While remotely sensed data is widely available for maize yield prediction, access to high-quality ground-truth yield records remains limited in many developing countries due to the costs and time-intensive nature of the process involved. To overcome these constraints, researchers have explored strategies such as a dimen- sionality reduction technique [23] and the generation of synthetic data to augment sparse records [30], thereby improving model robustness and reducing overfitting. These approaches have demonstrated promise in enhancing the reliability of training processes under data-limited conditions. However, there remains an urgent need for research in Africa, particularly in SSA, such as Uganda, that leverages remotely sensed data and DL models to develop scalable and context-specific maize yield prediction systems. The fol- lowing section discusses the materials and methods for model development. Page 7 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 3  Materials and methods 3.1  Dataset description Building on prior evidence that multimodal integration enhances spatial–temporal yield estimation [23, 36, 37], this study employs a dataset combining vegetation indices with soil and climatic data and historical maize records across nine ZARDI zones. Remote sensing features were obtained from publicly available sources, including MODIS imag- ery (frequent 16-day composite data at resolutions of 250-500 m) and Sentinel-2 (10 m). Climatic attributes collected from NASA POWER were first aggregated into daily data, then into seasonal values, aligned with Uganda’s two annual rain-fed maize production cycles (March-July and September-December), from 2018 to 2020 [38]. The Vegetation indices were derived from MODIS products (e.g., MOD13Q1 for NDVI/EVI/NDWI at 250  m resolution), LAI from MOD15A2H (500  m), CCI from Copernicus Sentinel-2 (10  m), and a cropland mask. Rainfall data were obtained from NASA GPM-IMERG (daily precipitation, aggregated by season). In contrast, solar radiation and soil moisture data came from NASA POWER/ERA5, and maximum and minimum temperature data were derived from NASA POWER. Both NDVI and EVI capture crop vigor by reflecting biomass expansion and vegeta- tion health. In parallel, LAI quantifies canopy structure, linking light interception, pho- tosynthesis, and transpiration to biomass accumulation and yield formation. NDWI adds a water dimension by detecting canopy moisture and drought stress. At the same time, CCI reflects chlorophyll content and photosynthetic activity, both of which are directly tied to grain set and productivity. Management practices, such as fertilizer application, enhance canopy vigor, leaf area, and chlorophyll concentration, while pest and disease pressures reduce greenness and water content, leading to declines in NDVI, EVI, CCI, and NDWI [17, 39]. Together, these indices provide complementary insights into crop conditions. When integrated with climatic variables, such as rainfall, solar radiation, maximum and minimum temperatures, and soil moisture, they capture the key biophys- ical drivers of crop performance, regulating growth, phenology, and water availability. This integrated perspective forms a robust foundation for reliable yield prediction [10]. The maize yield datasets (measured in tons per hectare) used in this study were sourced from the Uganda Bureau of Statistics (UBOS1). UBOS utilized standardized procedures through the Annual Agricultural Survey (AAS), thereby ensuring the data is nationally representative and quality-assured across all locations of the ZARDI. The use of UBOS, the national custodian of agricultural data, adds credibility, precision, and reli- ability to the empirically tested maize yield data used herein. These data were collected as part of the 50 × 2030 Initiative, an international effort led by the FAO and the World Bank to address agricultural data gaps in 50 countries by 2030. While the AAS provided standardized, nationally representative data, financial and logistical constraints pre- vented further follow-up localized surveys within the ZARDI zones after 2020 [27]. All 13 predictor variables, including Year, Rainfall, Solar Radiation, Max-Temp, Min-Temp, Humidity, Soil Moisture, NDVI, EVI, NDWI, CCI, LAI, and Cropland Fraction, were used in the study. These geographically diverse variables, spanning all ZARDI zones, enabled both spatial and temporal analyses of maize yield trends. The details of the data sources and variables employed are listed in Table 1. 1  https://www.ubos.org/ https://www.ubos.org/ Page 8 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 3.2  Data pre-processing Data pre-processing was one of the most critical steps in preparing this data for efficient, reliable predictive modeling of maize yields. Processing of predictors was first conducted using Google Earth Engine (GEE), where variables were sampled at 500 m intervals, and VIs and environmental variables were standardized to monthly intervals. Within each ZARDI zone, a satellite-derived annual cropland mask derived from MODIS MCD12Q1 for each specific year from 2018 to 2020 was applied to delineate active cropland pix- els. Within these masked cropland areas, sampling was further restricted to grid cells associated with maize production, as per UBOS Annual Agricultural Survey records and ZARDI agronomic reports. Vegetation indices (NDVI, EVI, LAI, NDWI, CCI) and envi- ronmental variables (rainfall, temperature, soil moisture, radiation) were then sampled at 500-m intervals only within these maize-specific pixels, and the resulting values were aggregated to seasonal means per ZARDI zone, aligning with Uganda’s two annual maize seasons. This spatial filtering ensured that the aggregated inputs represented maize- specific conditions in each zone. To improve data reliability, the zonal-level maize yield records from AAS were first cross-checked against independent regional statistics for Table 1  Sources of data Data Type Variables Temporal Resolution Spatial Resolution Data Source Link Weather data Tempera- ture, solar irradi- ance, and humidity Daily (varies)  ~ 0.5° × 0.5° (~ 50 km) NASA POWER https://power.larc.nasa. gov/ Rainfall Precipitation (IMERG) 2014–present, daily 0.1° × 0.1° (~ 10 km) NASA GPM (IMERG) ​h​t​t​p​s​:​​/​/​p​m​m​​.​n​a​s​a​.​​g​o​v​/​​r​e​s​ o​u​​r​c​e​s​/​​d​o​c​u​m​e​​n​t​s​/​​g​p​m​-​i​​ n​t​e​g​r​​a​t​e​d​-​m​​u​l​t​i​​-​s​a​t​e​​l​l​i​t​e​​-​r​ e​t​r​i​​e​v​a​l​​s​-​g​p​m​​-​i​m​e​r​​g​-​a​l​g​o​​ r​i​t​h​​m​-​t​h​e​o​r​e​t​i​c​a​l​-​b​a​s​i​s- Solar radiation All-Sky Surface Pho- tosyntheti- cally Active Radiation (PAR) Daily, near real-time  ~ 1° (CERES SYN1deg/ FLASHFlux) NASA CERES & FLASHFlux via POWER https://ceres.larc.nasa. gov/data/ Soil moisture Root zone soil wetness (0–100 cm depth) 2014–present, daily  ~ 0.5° (~ 50 km) NASA GMAO, MERRA-2 (GEOS DAS) ​h​t​t​p​s​:​​/​/​g​m​a​​o​.​g​s​f​c​​.​n​a​s​​a​.​g​o​ v​​/​r​e​a​n​​a​l​y​s​i​s​​/​M​E​R​​R​A​-​2​/ VIs NDVI, EVI (MOD13Q1 V6.1); CCI (Coper- nicus S2 Harmonized) 2018–2020 (seasonal) 250 m (NDVI/EVI), 10 m (CCI) MODIS (EOSDIS), Coper- nicus Sentinel-2 ​h​t​t​p​s​:​​/​/​l​p​d​​a​a​c​.​u​s​​g​s​.​g​​o​v​/​p​ r​o​d​u​c​t​s​/​m​o​d​1​3​q​1​v​0​6​1​/ ​h​t​t​p​s​:​​/​/​d​e​v​​e​l​o​p​e​r​​s​.​g​o​​o​g​l​e​ .​​c​o​m​/​e​​a​r​t​h​-​e​​n​g​i​n​​e​/​d​a​t​​a​s​ e​t​s​​/​c​a​t​a​l​​o​g​/​C​​O​P​E​R​N​I​C​U​S​ _​S​2​_​S​R​_​H​A​R​M​O​N​I​Z​E​D Photo- syntheti- cally active indices LAI (Leaf Area Index, MOD15A2H V6.1) 2014–2023, 8-day 500 m MODIS https://doi.org/10.5067/ MODIS/MOD15A2H.061 Land cover (masking) Cropland areas (≥ 60% cultivated cropland, Band 12) Annual 500 m MODIS MCD12Q1 V6.1 ​h​t​t​p​s​:​​/​/​l​p​d​​a​a​c​.​u​s​​g​s​.​g​​o​v​/​p​ r​o​d​u​c​t​s​/​m​c​d​1​2​q​1​v​0​6​1​/ https://power.larc.nasa.gov/ https://power.larc.nasa.gov/ https://pmm.nasa.gov/resources/documents/gpm-integrated-multi-satellite-retrievals-gpm-imerg-algorithm-theoretical-basis https://pmm.nasa.gov/resources/documents/gpm-integrated-multi-satellite-retrievals-gpm-imerg-algorithm-theoretical-basis https://pmm.nasa.gov/resources/documents/gpm-integrated-multi-satellite-retrievals-gpm-imerg-algorithm-theoretical-basis https://pmm.nasa.gov/resources/documents/gpm-integrated-multi-satellite-retrievals-gpm-imerg-algorithm-theoretical-basis https://pmm.nasa.gov/resources/documents/gpm-integrated-multi-satellite-retrievals-gpm-imerg-algorithm-theoretical-basis https://ceres.larc.nasa.gov/data/ https://ceres.larc.nasa.gov/data/ https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/ https://gmao.gsfc.nasa.gov/reanalysis/MERRA-2/ https://lpdaac.usgs.gov/products/mod13q1v061/ https://lpdaac.usgs.gov/products/mod13q1v061/ https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED https://doi.org/10.5067/MODIS/MOD15A2H.061 https://doi.org/10.5067/MODIS/MOD15A2H.061 https://lpdaac.usgs.gov/products/mcd12q1v061/ https://lpdaac.usgs.gov/products/mcd12q1v061/ Page 9 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 consistency. Anomalies were corrected during data cleaning to provide a reliable ground truth, essential for model training. Subsequently, categorical variables, such as ZARDI zones, were encoded using one-hot encoding to ensure ML compatibility without intro- ducing false ordinal relationships. At the same time, temporal variables (e.g., year) were converted to numerical values to retain sequential order by extracting the year compo- nent, thereby ensuring compatibility with predictive algorithms. Continuous features, such as NDVI, EVI, LAI, NDWI, rainfall, solar radiation, soil moisture, and temperature, were standardized using the StandardScaler to achieve uniform scaling and minimize potential bias arising from differences in magnitude across variables. They were then placed on a uniform scale with a mean of zero and a variance of one. This step elimi- nated disparities in variable magnitudes and prevented model bias toward features with larger numerical ranges. To address class imbalance, synthetic samples were generated using SMOGN [29], which expanded the dataset while preserving feature-target rela- tionships, thereby reducing bias from underrepresented yield ranges. Missing values, if present, were imputed with median values in numerical columns to minimize information loss and ensure robustness against outliers. Furthermore, inter- action terms were engineered, such as those between climatic variables (e.g., NDVI and rainfall), to capture nonlinear relationships and enhance expressiveness in the fea- ture space. To reduce dimensionality, a RF model was employed to rank features by importance. We retained the top 10 features (out of the initial 15) based on this rank- ing, including variables such as YEAR, MAX_TEMP, NDVI, and rainfall, which col- lectively accounted for the most variance in yield. Removing less influential features helped minimize noise and multicollinearity, ultimately improving model training effi- ciency. The final dataset was split into training and test sets at an 80:20 ratio to ensure good generalization and unbiased model performance. This split strategy, widely used in yield prediction studies, strikes a balance between the need for sufficient training data and the requirement for reliable test evaluation. Within the training set, we per- formed five-fold cross-validation to optimize model parameters and assess variability. This approach maximizes the use of the limited while guarding against overfitting. The resulting pre-processed dataset, enriched with synthesized data and feature engineer- ing, captured spatial, temporal, and environmental variability. This robust foundation supports the development of cutting-edge ML models with high predictive accuracy for maize yields across diverse agro-ecological regions in Uganda. The section that follows describes SMOGN oversampling as a crucial pre-processing step to improve the model's performance. 3.2.1  SMOGN oversampling for imbalanced yield regression data The lack of observations at the extremes of yield, such as very low or very high yields, in the target variable hampers the model's ability to learn those conditions [29]. SMOGN was applied as a pre-processing step to balance the yield distribution. It generates new samples in under-represented regions by interpolating between minority examples and their nearest neighbours, then adding Gaussian noise to increase variability. SMOGN also performs a mild under-sampling of the class to avoid biasing the model towards the middle of the distribution. We preferred SMOGN over simpler oversampling methods, such as SMOTER alone, because it better preserves continuous target relationships and has demonstrated superior performance on small, imbalanced regression datasets [40]. Page 10 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 The process involves inputting the original dataset, applying SMOGN to create a bal- anced dataset, and then training the model on this new balanced data [29]. The pseudo- code for the SMOGN technique is shown in Algorithm 1 [41]. Algorithm 1: Pseudocode for SMOGN. 3.2.2  Application and limitations of SMOGN oversampling After SMOGN, the training set increased to 295 samples, ensuring that extremely high and low yields were adequately represented. We verified that the augmented yields remained within realistic bounds, without introducing implausible extreme values, and oversampling preferentially added samples to previously sparse yield ranges, such as those below 1.0 t/ha and above 2.5 t/ha, resulting in a more uniform distribution. The synthetic data were generated with careful parameter tuning, using a neighbour count of k = 5, a Gaussian noise level of 0.01, and 100% oversampling of minority instances, following the guidelines of Branco et al. [29]. To reduce the risk of unrealistic yield val- ues being introduced during augmentation, we constrained the generated values within the biologically plausible range of maize yields observed in Uganda (0.5–4.5 t/ha). This ensured that synthetic samples remained consistent with known agronomic conditions. Additionally, we visually examined yield distribution plots before and after augmenta- tion. The post-SMOGN distribution was more uniform, while still following the natural trends of the original dataset, thereby validating that no extreme outliers were created. As a result of this pre-processing step, the dataset’s imbalance was significantly reduced while maintaining a realistic feature-target relationships approach validated in recent studies [41, 42]. Despite these improvements, SMOGN remains a heuristic method and may still introduce some artefactual values in edge cases; therefore, careful post-hoc val- idation and visualization remain essential. Its assumption of local smoothness further limits the ability to capture complex, non- linear, high-dimensional interactions in a multimodal dataset [43]. As a result, while suitable for moderately structured data, SMOGN may underperform in contexts with discontinuities or latent heterogeneity, where hybrid approaches incorporating genera- tive or manifold-based methods could offer more robust alternatives. Recent advances in deep generative models, particularly Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have established themselves as the current state of the art for generating synthetic data for crop yield datasets. However, given our limited sample size, training a GAN or VAE reliably would be challenging. GANs typically require thou- sands of examples to capture the data distribution accurately and avoid model collapse [44]. Page 11 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 3.3  The proposed model Figure 1 illustrates the architecture of a CNN-LSTM model for maize yield prediction using multimodal climate and remote-sensing data. The process begins with multiple input streams for each maize-growing season, including satellite-derived indices (e.g., EVI and NDVI) and temporal climate variables (e.g., temperature and rainfall) span- ning the season. After pre-processing these inputs (Xt), they are passed to the 1D-CNN, which extracts spatial features from the sequence (e.g., identifying patterns in NDVI that correlate with crop biomass). The CNN processes the inputs, yielding a spatially enriched feature vector summarizing patterns at each time step. This sequence of fea- ture vectors is then fed into the LSTM module, enabling the system to learn how these features evolve from one time step to the next. The LSTM component facilitates an in- depth understanding of seasonality, the complexity of growth stages, and time-depen- dent behaviors, such as the impact of early-season rainfall on mid-season vegetation health, which are crucial for crop prosperity [45]. The building of the LSTM cell oper- ates via forget (ft), input(it), and output gates (ot), each regulated by the sigmoid activa- tions (σ), which regulate the movement of data through its cell state (ct). The candidate memory cell (gt) uses a tanh activation to propose updates to the cell state. The gates determine which parts of the past to keep or abandon, how much new input to take in at any given time, and what is essential to save and pass on. The LSTM's output at each time point, t, is denoted as Ht, and the hidden state (ht) encodes the learned temporal relationships, serving as a distilled representation of all relevant historical data leading up to the current time step[46]. ​The output Ht​ is then passed onto a fully connected layer, acting as an advanced spatiotemporal integrator. During this step, the aggregated feature vector is transformed into a form suitable for prediction. Finally, the created rep- resentation is passed to the yield estimation layer, which is implemented as a dense out- put layer with a linear activation. This last one does the regression task and produces the predicted maize yield, measured in tonnes per hectare, as the final output [21, 23, 24]. Fig. 1  The architecture of the proposed CNN-LSTM Page 12 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 3.3.1  Hyperparameter tuning strategy This work used a suite of ML and DL models, each tuned via careful hyperparameter optimization, to evaluate the performance of various modeling paradigms for forecast- ing maize yield. The models were refined using a grid search and fivefold cross-valida- tion on the training dataset, generated from 295 records using the SMOGN method. Independent, systematic hyperparameter optimization was performed separately for DL and baseline ML algorithms. The CNN-LSTM model was deliberately kept narrow (with only one CNN layer and one LSTM layer) to minimize the risk of overfitting on the limited data. A grid search was used to explore various combinations of convolutional filters (16, 32, 64), kernel sizes (3, 5, 7), LSTM units (32, 64, 128), learning rates (0.001, 0.0005), and batch sizes (16, 32). Model selection criteria were based on the mean vali- dation loss across five-fold cross-validation, ensuring that the test set remained unseen during hyperparameter optimization. To prevent overfitting, early stopping (patience of 20 epochs) and dropout regularization (rate of 0.3) were implemented. The final archi- tecture of the CNN-LSTM was intentionally kept narrow and straightforward, with just a single convolutional and a single recurrent layer, considering the dataset size. For the Random Forest algorithm, hyperparameters such as the number of estimators (100, 200, 500), the maximum depth of the tree (5, 10, 15), and the learning rate (0.05, 0.1, 0.2 for boosting) were tuned using grid searches with the assistance of five-fold cross-valida- tion. This resulted in a balanced architecture that generalizes well while minimizing the risk of overfitting. 3.3.2  Model architectures and final configurations RF was used as a baseline due to its robustness to high-dimensional data, fast training time, and ability to model nonlinear interactions between inputs. The optimal hyperpa- rameter configuration consisted of 200 trees, a maximum depth of 20, and a minimum sample split of 2. The CNN model was used as another baseline. It comprised two 1-D convolutional layers with 32 and 64 filters, followed by a ReLU activation function for introducing non-linearity and a kernel size of 3. A max pooling layer followed this to reduce spatial dimensionality, and a dense regression output layer. Model training mini- mized mean squared error using the Adam optimizer with a learning rate of 0.001 and the MSE loss function for up to 50 epochs, with a batch size of 8. A hybrid CNN–RF model was also used as a comparative model. The CNN block consisted of one 1-dimen- sional convolutional layer with 64 filters of kernel size 3, followed by batch normaliza- tion, a ReLU activation, a max pooling layer, and a dropout layer with a rate of 0.3. The extracted feature maps from the CNN block were flattened and passed to an RF regres- sor with 100 decision trees. Each tree was trained on bootstrapped subsets of the data to enhance generalization. The final prediction was generated by averaging the outputs of all trees, allowing the model to perform nonlinear regression while maintaining inter- pretability and robustness to noise. The proposed CNN-LSTM model was developed to learn the spatial features from cli- mate inputs and the seasonal temporal dynamics of environment variables. The architec- ture began with a 1-D convolutional layer featuring 64 filters of kernel size 2, followed by a max pooling layer and a dropout layer with a dropout rate of 0.3 to reduce overfitting. The extracted spatial features were fed into an LSTM layer with 100 units, enabling the model to learn long-range sequential dependencies. The CNN and dense layers adopted Page 13 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 ReLU activation, while sigmoid and tanh activations were employed in the LSTM for gate operations and memory cell updates. A fully connected dense layer provided the final regression output with linear activation. The model was optimized using the Adam optimizer with a learning rate of 0.001 and MSE loss. Early stopping was imple- mented with a patience of 20 epochs to prevent overfitting, and a 20% validation split was applied. Optimization was performed with a batch size of 16 for up to 50 epochs (Table 2). 3.4  Performance metrics The performance of the proposed model was evaluated using four key metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R2 score. These metrics can be used comprehensively to assess the model's predic- tive accuracy. The MAE was chosen for its intuitive interpretation as the average mag- nitude of errors in the predictions, providing a simple measure of prediction accuracy that does not overweight larger errors [47, 48]. MSE was included because it squares the prediction errors, thereby emphasizing larger prediction errors and being especially useful for identifying those models that minimize significant deviations [18]. The MSE is derived into the RMSE, which presents error values in the same units as the target variable, maize yield, making them easily interpretable and practical [49]. Lastly, the R2 score was selected to examine the proportion of variance in maize yield that the model explained. It provided a normalized measure of predictive power that accounted for dataset variability. It follows that these metrics provide a balanced assessment of the model's accuracy and reliability, thereby enhancing the robustness of the performance validation over Uganda's ZARDI zones. The MAE, MSE, RMSE, and R2 Score are shown in Eqs. 1–4, respectively [50]. MAE = 1 n n∑ i=1 |yi − ŷi|� (1) MSE = 1 n n∑ i=1 (yi − ŷi)2 � (2) RMSE = √√√√ 1 n n∑ i=1 (yi − ŷi)2� (3) Table 2  Summary of optimal hyperparameters for each model Model Key Hyperparameters Final Configuration Random Forest (RF) Number of estimators, maximum depth, minimum samples split Estimators = 200; Max depth = 20; Min split = 2 CNN Conv layers, filters, kernel size, activa- tion, pooling, optimizer, training setup 2 Conv1D layers (32, 64 filters); Kernel size = 3; ReLU activation; Max pooling; Dense output; Adam opti- mizer (LR = 0.001); MSE loss; 50 epochs; Batch size = 8 CNN–RF Hybrid CNN block + RF regressor Conv1D (64 filters, kernel = 3); Batch normalization; ReLU; Max pooling; Dropout = 0.3; Flatten → RF regressor (100 trees); Output = averaged predictions CNN-LSTM Conv filters, kernel size, LSTM units, dropout, activations, optimizer, train- ing setup Conv1D (64 filters, kernel = 2); Max pooling; Drop- out = 0.3; LSTM (100 units); Dense output (linear); Adam optimizer (LR = 0.001); MSE loss; 50 epochs; Batch size = 16; Early stopping (patience = 20) Page 14 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 R2 = 1 − ∑n i=1 (yi − ŷi)2 ∑n i=1 (yi − yi) 2 � (4) where: yᵢ: Actual value for the i-th observation, ŷᵢ: Predicted value for the i-th observa- tion, ȳ: Mean of the actual values, n: Total number of observations. 3.5  Model implementation The proposed model for predicting maize yield has been implemented in Google Colab, a cloud-based platform that enables easy collaboration in coding and powerful com- puting. The implementation of this model is written in Python 3.10; key libraries used include Pandas and NumPy for data manipulation and pre-processing, Scikit-learn for model training and evaluation, and Matplotlib for visualization. Pre-processing steps included scaling continuous features using StandardScaler to ensure consistent scaling across variables. The dataset was split 80–20 into training and test sets to ensure unbi- ased evaluation. 4  Experimental results 4.1  Performance of the proposed model The performance metrics of the evaluated models provide insight into their predictive capability for maize yield, as shown in Table  3. The CNN-LSTM surpassed the CNN and Random Forest results in all comparisons. The R2 of the CNN-LSTM was approxi- mately 0.7833, and the RMSE was approximately 0.33 t/ha, with a MAE of roughly 0.27 t/ha. This indicated its effective ability to minimize prediction error and explained approximately 78.33% of the variance in the maize yield data, making it the most robust and accurate model among those tested. Notably, these results marked a substan- tial improvement over their counterparts from the CNN (R2≈0.56, RMSE≈0.46 t/ha, MAE≈0.37 t/ha) and the Random Forest (R2≈0.57, RMSE≈0.46 t/ha, MAE≈0.31 t/ha). The reference results from the CNN + RF ensemble were sensible (R2≈0.72, RMSE≈0.37 t/ha, MAE≈0.28 t/ha), demonstrating that combining spatial information extraction with a nonlinear regressor yields better results than using either model individually. This means that although Random Forest effectively captured the relationship in the data, it performed less accurately on its own than the ensemble and CNN-LSTM. Similarly, the CNN model may not adequately capture the full range of temporal dependencies in the data. The above findings also underscore the importance of combining complementary modeling techniques to improve predictive accuracy in agricultural yield forecasting. Notably, these performance gains were achieved without overfitting the data; the CNN- LSTM’s hyperparameters were tuned via cross-validation, and its training was regular- ized with dropout and early stopping, ensuring that the model generalizes well to unseen data. 4.2  Comparison between the actual maize yields and the predictions of different models Figure  2 presents the ground-truth real maize yields, represented by the blue line, against the predictions of different models: CNN (purple line), Random Forest (green line), CNN-LSTM (red line), and the Ensemble Model (CNN + Random Forest) (orange line). The performance of each model is analysed based on its ability to closely follow the trend in actual maize yields across different sample indices. The CNN-LSTM accurately Page 15 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 Ta bl e  3  M od el p er fo rm an ce re su lts (c al ib ra tio n an d va lid at io n) M od el Ca lib ra tio n M SE Ca lib ra tio n RM SE Ca lib ra tio n R2 Va lid at io n M SE Va lid at io n M A E Va lid at io n RM SE Va lid at io n R2 C N N 0. 19 85 0. 44 56 0. 58 94 0. 21 58 0. 37 35 0. 46 46 0. 56 22 C N N  +  R an do m F or es t) 0. 12 24 0. 34 99 0. 75 10 0. 13 68 0. 28 09 0. 36 99 0. 72 25 Ra nd om F or es t 0. 19 42 0. 44 07 0. 60 18 0. 21 05 0. 30 54 0. 45 88 0. 57 30 Pr op os ed C N N -L ST M 0. 09 56 0. 30 92 0. 81 25 0. 10 68 0. 26 67 0. 32 68 0. 78 33 Page 16 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 tracked actual yields across the full range, including both high- and low-yield exam- ples, whereas the CNN and RF often exhibited systematic biases. For instance, one of the highest observed yields at sample index 6 was 3.61 t/ha; the CNN-LSTM predicted approximately 3.13 t/ha, while the CNN underpredicted at around 2.99 t/ha, and the RF slightly overpredicted (~ 3.55 t/ha). In a low-yield case (actual 1.40 t/ha), CNN-LSTM’s prediction was ~ 1.993 t/ha, compared to CNN’s 2.291 t/ha and RF’s 2.362 t/ha. These examples illustrate that CNN-LSTM errors (on the order of 0.1–0.5 t/ha) were minor and less biased than those of the other models at both extremes of the yield distribution. In general, the CNN tended to under-estimate peak yields, while the RF sometimes over- estimated lower yields; the hybrid CNN + RF ensemble mitigated some errors but still lagged CNN-LSTM’s accuracy. It follows that the CNN-LSTM model is the most reliable predictor, as its trend closely follows the actual values of maize yield, particularly for both high and low yield extremes. This demonstrates its potential as a reliable model for accurately predicting maize yields over complex agricultural datasets. 4.3  Comparison of CNN, random forest, CNN-LSTM, and CNN-random forest Figure 3 compares the four models, CNN, Random Forest, CNN-LSTM, and CNN-Ran- dom Forest, using MSE, MAE, RMSE, and R2 Score to predict maize yield in tonnes per hectare. The CNN (with 1D convolutional layers) represented a naive DL approach that used the same input format; it achieved a moderate R2 score of 0.56, indicating that tem- poral dependencies were not fully captured by the CNN alone. The RF represented a classic ML approach that performed similarly to a CNN, with an R2 of 0.57, highlighting that a non-temporal model can capture some relationships but misses sequential effects. Building on these baselines, the CNN-Random Forest ensemble performed impres- sively, with an excellent MSE of 0.137, MAE of 0.281 tonnes, and RMSE of 0.370 tonnes, while achieving an R2 score of 0.722. This validated the idea of integrating spatial feature extraction with a nonlinear regressor. This ensemble’s success motivated the develop- ment of the combined CNN-LSTM model, which outperformed all other models with the lowest MSE of 0.107, MAE of 0.267 tonnes, and RMSE of 0.327 tonnes, with the highest R2 score of 0.783. This reveals its strong ability to extract spatial and temporal patterns from yield data. It enables highly accurate maize yield predictions, for instance, 2.5 tonnes per hectare, with minimal deviations from the actual values. This proved to be an effective combination, leveraging the complementary strengths of both models while aligning with the characteristics of our dataset in our data-scarce setting. By con- trast, single models such as CNNs and RFs showed higher errors and lower R2 scores, suggesting their predictions are not particularly accurate. These four models were selected as the baseline for predicting maize yield in Uganda. The chosen models are applicable and relevant to the available data. While gradient boosting algorithms, such as XGBoost and CatBoost, surpass RF on tabular datasets [45], they are not inherently designed for learning spatial–temporal sequences. Apply- ing them in this study would have required extensive feature engineering to capture these dynamics, rendering them less practical in this context. We also recognize that more advanced architectures, such as CNN-Attention-LSTM, BiLSTM, 3D-CNN, and Transformer-based architectures, could further enrich the comparison given their abil- ity to capture spatial and temporal dependencies [37, 51]. But they were excluded due to the limited size of the available training dataset and the substantial computational Page 17 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 demands they entail. For instance, BiLSTM, which processes sequences both forward and backward, would roughly double the number of trainable parameters, increasing the risk of overfitting to our relatively limited time-series data. Likewise, a 3D-CNN was not employed, as it would require high-resolution spatial–temporal grids rather than our categorical zones and substantially larger datasets to support its parameter complexity. Similarly, transformer models have shown promise in capturing long-range dependen- cies in crop yield data; however, they typically require extensive training data and sub- stantial computational resources [17, 52]. Consequently, the benchmarking was focused against top and realistic models rather than exhaustively examining all possible architectures. The models have been trained and evaluated using the same standardized data, which merged remotely sensed climatic and VIs with maize yield in Uganda's ZARDI zones for 2018–2020. The SMOGN tech- nique was employed to address the natural imbalance in the yield data. The controlled Fig. 3  The comparison of CNN, Random Forest, CNN-LSTM, and CNN-Random Forest Fig.  2  Comparison between the actual maize yields and the predictions. The Sample index is an arbitrary se- quential label assigned to each unique zone-season observation in the yield dataset (2018–2020). Each index corresponds to a single zone-season record, ordered chronologically Page 18 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 experiment setting adds weight to the conclusion and underscores the CNN-LSTM model's potential to handle spatiotemporal patterns. Overall, results indicate that the CNN-LSTM model provides the most accurate predictions, followed by the CNN-RF ensemble. Therefore, these models help improve maize yield forecasting in agricultural decision-making. 4.4  Predicted values of maize yield in tonnes per hectare for various models Table 4 presents the predicted maize yields (tonnes per hectare) for the following mod- els: CNN, CNN-LSTM (proposed), Random Forest, and CNN + Random Forest. These have been compared with the actual yield for each to assess their performance. The proposed CNN-LSTM produced the best performance in approximating actual values. For example, the proposed CNN-LSTM model estimated a yield of 3.610 tonnes/hect- are, which was very close to the actual yield of 3.128 tonnes/hectare. For an actual value of 2.621 tonnes/hectare, the estimated value of CNN-LSTM was 2.501 tonnes/hectare, maintaining a minimal level of deviation. This demonstrates the ability to capture com- prehensive spatial and temporal patterns in this dataset. While the random forest pre- diction was strong, there was some overprediction. For instance, an actual yield of 1.9 tonnes/hectare was predicted to be 2.397 tonnes/hectare, which is an overestimation of the value. Ensembles improved performance by combining the strengths of the different individual models. This provided much more refined predictions by the CNN + Random Forest ensemble, such as 3.341 tonnes/hectare for an actual yield of 3.610 tonnes/hect- are and 2.239 tonnes/hectare for an actual yield of 1.9 tonnes/hectare, thereby reducing the prediction errors compared to standalone models. These findings raise the impor- tance of integrating models to leverage their complementary strengths for reliable agri- cultural yield predictions. 4.5  The relationship between the predicted maize yields and the actual maize yields Figure 4 shows a scatter plot of the relationship between the predicted and actual maize yields in tonnes per hectare, as forecasted by the CNN-LSTM model (blue dots) and the actual values from the dataset. The red dashed line represents the ideal fit, with the predicted values falling exactly on the actual values. The forecasted yields in the plot are typically very close to the perfect line, reflecting the model's ability to approximate the actual values accurately. For instance, a real yield of about 2.5 tonnes/hectare for maize Table 4  Predictions of maize yields Actual CNN Predictions CNN-LSTM Predictions Random Forest Predictions Ensemble (CNN- Random Forest) 3.61 2.999 3.128 3.553 3.341 1.9 2.556 2.08 2.397 2.239 1.4 2.291 1.993 2.362 2.178 2.994 3.249 3.093 3.008 3.051 1.6 2.161 1.854 2.35 2.102 2.622 2.546 2.501 2.589 2.545 3.593 3.032 3.164 3.554 3.359 2.4 2.493 2.457 2.193 2.325 2.385 2.478 2.529 2.424 2.477 2.2 2.117 1.694 1.473 1.583 2.996 3.225 3.064 3.029 3.047 Page 19 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 translates to approximately 2.6 tonnes/hectare in the CNN-LSTM model. This shows a little deviation from the actual. Similarly, predictions closely match actual values for a low yield of around 1.5 tonnes/hectare. However, there are minor deviations from the ideal fit. For example, at an actual yield of 2.0 tonnes/hectare, the predicted yield is slightly overestimated to about 2.2 tonnes/hectare. These minor discrepancies notwith- standing, the general scatter of the points around the red-dashed line indicates that the model captures the underlying structure of the maize yield data. From these, the result- ing graphs show the predictive performance of the CNN-LSTM model: strong and with points closely clustered around the ideal fit line, further demonstrating that the CNN- LSTM is well-suited for this maize yield prediction problem. 4.6  The learning curves of the proposed CNN-LSTM model Figure  5 shows the learning curves of the proposed CNN-LSTM model. It shows the trend in training and validation losses across 50 epochs. The training loss declines steeply within a few epochs (blue curve), indicating that the model has learned most of the underlying patterns in the training data remarkably quickly. Similarly, the valida- tion loss also decreases drastically in the first few epochs, indicating the model's ability to generalize to unseen validation data. As the epochs progress, both curves converge and flatten, with the validation loss consistently lower than the training loss. That can be viewed as an indication that the model does not overfit the training data and has learned the patterns relevant to predicting the maize yield. This robustness and stability of the model are further supported by consistent convergence of both curves towards a low mean squared error. In the final epochs, the validation loss stabilizes, indicating that the model has reached its optimal learning capacity. This behaviour demonstrates the effec- tiveness of the CNN-LSTM architecture in modeling both spatial dependencies from the CNN and temporal dependencies from the LSTM in the dataset, thereby enabling accurate prediction of maize yield. The minimal gap between the training and validation losses further indicates that the model has a good balance between bias and variance, confirming its reliability for real-world applications. Fig. 4  Relationship between the predicted maize yields and actual maize yields Page 20 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 4.7  The feature importance scores Figure 6 shows the feature importance scores of the ML model in predicting maize yield. The prominence of 'Year' as the single most significant characteristic underscores the high interannual variation in maize yield in Uganda. 'Year' serves as a chronological indi- cator that integrates broader forces, including climatic changes, agronomic practices, and socio-economic variations that occur annually but are poorly captured through other covariates. For instance, changes in rainfall onset, distribution, and intensity from one year to the next significantly influence maize yields, and incorporating the 'Year' variable enables the model to capture these changes. Similar results have been reported in other studies, with 'Year' emerging as a significant predictor of crop yield variability because it captures systematic changes across growth seasons [53]. Thus, the high level of focus on 'Year' in this analysis does not indicate bias in the model, but rather reflects Fig. 6  The feature importance scores Fig. 5  The learning curves of the proposed CNN-LSTM model Page 21 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 its role in capturing inter-annual diversity, which has a significant impact on yield out- comes in smallholder farming enterprises. This is followed by the feature MAX_TEMP, which reflects the critical role of maxi- mum temperature in influencing maize growth and yield. Because very high tem- peratures can directly affect crop productivity, this feature is essential for accurate prediction. The ZARDI variable, representing geographical zones, ranks third, under- scoring that yield varies across regions due to climatic and environmental conditions. Among these remote sensing variables, EVI and NDWI are the most significant con- tributors, indicating that vegetation health and water availability are key factors in yield prediction. Similarly, in photosynthesis and crop development, SOLAR_RAD and NDVI also contributed moderately. Features like RAINFALL, CCI (Canopy Chlorophyll Index), SOIL_MOISTURE, and MIN_TEMP contribute relatively low scores in this respect, yet they still add to the model's predictive potential. Although RAINFALL was not the highest-ranked feature overall, it remains agronomically critical. Its influence on yield is partly indirect; adequate rainfall improves VIs, such as NDVI, NDWI, and EVI, but extreme rainfall shortages or excesses directly impact yield and are reflected in the mod- el’s predictions. Together with soil moisture, it consistently emerged as one of the lead- ing predictors, underscoring water availability as the principal constraint on maize in Uganda’s semi-arid regions and reinforcing confidence in the model's robustness. Over- all, this analysis highlights the interaction of temporal, climatic, spatial, and vegetation indices, where the model leverages these features to enhance the robustness and accu- racy of maize yield predictions. 4.8  The box plot of residuals for the proposed CNN-LSTM model Figure 7 presents the box plot of residuals for the CNN-LSTM model. It describes the distribution of errors, which is the difference between the predicted and actual maize Fig. 7  The box plot of residuals for the CNN-LSTM model Page 22 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 yields. The residuals are distributed symmetrically around the median, which is approxi- mately zero. This suggests the model has no significant bias in underestimating or overestimating maize yield. The IQR is the width of the blue box, holding most of the residuals, which are small and fall within an acceptable range. The whiskers extend to the minimum and maximum values of the residuals; there are no extreme outliers, sug- gesting the model may be robust and consistent across the dataset. The compactness of the residual distribution demonstrates that the CNN-LSTM model effectively learns the underlying data pattern, thereby providing reliable predictions. Overall, this plot sup- ports the claim that the CNN-LSTM minimizes prediction error and produces balanced, unbiased results for maize yield prediction. 5  Discussion of results The results of this study indicate that the proposed CNN-LSTM model significantly enhances the accuracy of maize yield prediction compared to individual models such as CNN and Random Forest. With an MSE of 0.1068 tonnes2, a RMSE of 0.3268 tonnes, and an R2 score of 0.7833, the CNN-LSTM model demonstrated superior performance in learning both spatial and temporal dependencies in the data. These findings align with existing research that highlights the effectiveness of DL architectures for yield prediction [8, 35]. Random Forest (RF) performed well, achieving an MSE of 0.2105 tonnes2, RMSE of 0.4588 tonnes, and R2 score of 0.5730. While RF effectively captured complex relation- ships in the dataset, its reliance on static feature selection limited its ability to capture long-term dependencies in maize yield trends. This observation is consistent with previ- ous studies that found ensemble tree-based methods, such as RF, to be strong predictors in structured agronomic data but limited in time-series forecasting [18]. In contrast, the CNN-LSTM model integrated sequential learning, enabling it to detect seasonal varia- tions and long-term dependencies critical for yield prediction and surpassing RF in pre- dictive accuracy. Though effective in capturing spatial features, the CNN-only model performed rela- tively lower than CNN-LSTM, with an MSE of 0.2158 tonnes2 and an R2 score of 0.5622. This suggests that CNN alone cannot fully capture temporal patterns and sequence- based variations in yield trends. Similar findings were reported by Sun et al. [23], who observed that CNN models performed better when combined with sequence-learning models, such as LSTMs, in crop yield estimation. Furthermore, an ensemble of CNN and Random Forest showed substantial improvements, achieving an MSE of 0.1368 tonnes2, an RMSE of 0.3699 tonnes, and an R2 score of 0.7225. This improvement highlights that combining convolutional feature extraction from a CNN with decision-tree-based fea- ture importance analysis from an RF enhances predictive accuracy. Similar trends were observed in studies where hybrid models outperformed standalone ML models by lever- aging multiple feature extraction strategies [54]. Comparing the CNN-LSTM model's performance to prior studies, its R2 score of 0.78 closely aligns with Zhou et al. [22], who achieved an R2 of 0.78 using a CNN-Attention-LSTM model for maize yield prediction in China. Their model leveraged a similar multi-source dataset, incorporating remote sensing vegetation indices and climate variables. The proposed CNN-LSTM model also exhibited lower bias and variance in residual analysis compared to traditional ML mod- els. This aligns with the findings of Muruganantham et al. [13], who noted that deep Page 23 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 learning models exhibit greater robustness when trained on multi-source datasets and with synthetic data augmentation techniques. One key novelty in this study was the use of synthetic oversampling (SMOGN) to enhance model training. Acquiring high-quality ground-truth maize yield data across Uganda’s ZARDI zones is challenging due to financial and logistical constraints. Conse- quently, the study utilized a dataset covering only three years, 2018 to 2020 (two seasons per year). By applying SMOGN, the training set size increased to 295 samples, ensuring that low- and high-yield cases were better represented. This argumentation significantly improved predictive accuracy, consistent with prior findings such as Ebrahimy et al. [30] and [42] that synthetic data can enhance crop yield models. Feature importance analysis identified YEAR as the most influential predictor. This likely reflects unmeasured tem- poral trends, such as gradual improvements in seed varieties and farming practices, that influenced yields from 2018 to 2020. In other words, the “Year” feature may serve as a proxy for factors not explicitly represented in the dataset. Among other top features are MAX_TEMP, NDVI, and EVI, which are sensible given maize's sensitivity to heat and the importance of canopy greens during growth. The prominence of NDWI and CCI among important variables further underscores that moisture status and chlorophyll lev- els are key factors in maize productivity. These findings are consistent with prior studies, which have shown that temperature and VIs are strongly linked to maize growth stages [17, 55, 56]. Thus, expanding irrigation systems in the north and northeast regions where rainfall deficits are most significant could secure yields and strengthen climate resilience. The findings highlight the capacity of hybrid DL methods to overcome data scarcity in smallholder farms, yielding more accurate and rapid yield estimates. Such advancements should aid in agricultural planning, enhance resource utilization, and bolster food secu- rity efforts in areas with limited data. 5.1  Limitations and future work This study has several limitations, which also highlight directions for future research. The data set, covering the period from 2018 to 2020, was relatively small and may limit temporal generalizability, increasing the risk of overfitting. Although synthetic overs- ampling techniques, specifically SMOGN, were applied to address rare-yield situations, they assume local smoothness, which may produce unrealistic data in high-complexity attribute spaces. Moreover, the modest sample size prohibited explicit cross-zone vali- dation. Splitting data would have further reduced sample sizes, risking instability dur- ing training and yielding unreliable estimates. Instead, we employed a fivefold stratified cross-validation, with SMOGN-based oversampling, dropout, and early stopping to improve robustness and reduce the risk of overfitting. While this approach achieved high predictive accuracy across the entire dataset, the lack of zone-specific holdout test- ing limits generalizability to novel agroecological zones. The exclusion of high-resolution imagery such as Sentinel-2 or high-resolution MODIS products, which better capture localized yield variability, also constrained spatial resolution. This limitation further restricted the use of data-intensive architectures, such as Transformers or 3D CNNs. Additionally, model explainability remains incomplete: Random Forest's use of impor- tance scores provides only a global ranking of predictors without indicating the direc- tion of their effects, i.e., whether a feature increases or decreases yield. Methods such as SHAP or LIME could yield more valid and locally interpretable insights. Prediction Page 24 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 uncertainty, such as confidence intervals, was not assessed; the analysis relied solely on deterministic performance metrics (e.g., R2), which provide only point estimates on test data. Finally, although the CNN-LSTM model demonstrated improved predictive abil- ity, the baseline ensemble models relied on equal-weight averaging, which may not fully leverage the CNN-LSTM’s relative strengths under varying yield conditions. These limi- tations highlight the need for future research to enhance the CNN-LSTM framework through adaptive learning and rigorous uncertainty quantification.   • Adaptive Ensemble Learning. Moving beyond equal-weight averaging, weighted stacking can dynamically optimize each model's contribution, leading to further improvements in accuracy [57].   • Future work should also focus on quantifying prediction uncertainty. Generating confidence or prediction intervals, for example, through Bayesian neural networks, would convey valuable insights into the reliability of its forecasts. Such a certainty estimate is beneficial for decision-makers, as it indicates the level of confidence associated with each predicted yield and helps assess risk in planning [58].   • Integration of Transformer-based models. Future research can explore advanced architectures, such as BiLSTMs, 3D CNNs, and CNN-Transformer hybrids, to enhance attention mechanisms for maize yield prediction [52]. Recent studies [25] suggest that Transformers outperform LSTMs in capturing long-range dependencies.   • Incorporation of high-resolution satellite imagery. Expanding the dataset to include higher-resolution data, such as Sentinel-2 and fine-resolution MODIS products, UAV and IoT-derived field observations, could improve spatial detail and enhance predictive accuracy by capturing more localized crop variability [52].   • Explainable AI techniques. incorporating interpretability methods, such as SHAP values, provides valuable insights into the influence of individual features on the model's predictions. Complementary tools, such as LIME for local interpretability and Grad-CAM, for visualizing CNN-LSTM decision processes, would further improve transparency and user trust in the model's outputs [59].   • Finally, future research should involve spatial cross-validation using larger and more geographically diverse datasets, enabling systematic evaluation of the model transferability across the highly heterogeneous maize-production environments in Uganda [60]. 6  Conclusion The study proposed a CNN-LSTM model to forecast maize yields across Uganda's ZARDI zones, integrating climate and remote-sensing data. A fully representative com- bined dataset capturing the complex interactions between climatic variables and maize yield was created through a comprehensive pre-processing pipeline that involved feature scaling, synthetic data augmentation, and dimensionality reduction. Among all these, the CNN-LSTM model has outperformed many single models, such as CNN and Ran- dom Forest, by significantly reducing prediction errors, demonstrating powerful predic- tive capability with an R2 value of 0.78. The models further enhanced robustness and precision, with the CNN-Random Forest ensemble improving the R-squared score to 0.72, thereby underpinning the complementary strengths of deep learning and tradi- tional machine learning. The residual analysis confirmed the reliability of the proposed model, with minimal bias in predictions and a strong fit to actual maize yield values. This Page 25 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 work underlines the importance of using multi-source datasets and state-of-the-art ML techniques to address the challenges of agricultural yield forecasting. Thus, contributing both methodological innovation and regional insights, CNN- LSTM approaches emerge as a scalable tool for yield forecasting in data-scarce small- holder systems across SSA. They enable decision-makers and smallholder farmers to optimize resource allocation, mitigate risk, and plan farming practices effectively. While our model improved performance, limitations include a short data span, a lack of uncertainty quantification, the need for high-resolution satellite imagery, spatial crop validation, and interpretability. Future work will incorporate larger, multi-year datasets, higher-resolution imagery, uncertainty quantification, and employ explainable AI tools, along with advanced architectures such as Transformers, to yield more robust forecasts and stronger decision support for agricultural stakeholders. Hence, leveraging ML and DL technologies to predict crop yields would be crucial to advancing modern agricul- ture and alleviating global hunger. Abbreviations CNN � Convolutional neural network LSTM � Long short-term memory RF � Random forest ANN � Artificial neural network ML � Machine learning DL � Deep learning CYP � Crop yield prediction SSA � Sub-Saharan Africa ZARDI � Zonal Agricultural Research and Development Institute NDVI � Normalized difference vegetation index EVI � Enhanced vegetation index NDWI � Normalized difference water index LAI � Leaf area index GPP � Gross primary productivity SMOGN � Synthetic minority oversampling technique for regression SMOTER � Synthetic minority oversampling technique (for regression) MSE � Mean squared error RMSE � Root mean squared error MAE � Mean absolute error R2 � Coefficient of determination MAPE � Mean absolute percentage error IQR � Interquartile range UAV � Unmanned aerial vehicle AE � Variational autoencoders GAN � Generative adversarial networks LIDAR � Light detection and ranging CCI � Chlorophyll content index AI � Artificial intelligence SHAP � SHapley Additive exPlanations LIME � Local interpretable model-agnostic explanations Grad-CAM � Gradient-weighted class activation mapping Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s44163-026-00855-7. Supplementary Material 1 Acknowledgements The authors gratefully acknowledge Kyambogo University for providing access to resources, a conducive research environment, and financial support. Author contributions Danison Taremwa: conceptualization, methodology, investigation, writing—original draft. Emmanuel Ahishakiye: supervision, guidance, review & editing. Aggrey Obbo: review & editing. Paul Kategaya Kisozi: review & editing. Fred Kaggwa: supervision, guidance, review & editing. All authors approved the manuscript. https://doi.org/10.1007/s44163-026-00855-7 Page 26 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 Funding No organization, institution, or research centre funded this study. Data availability The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. Declarations Ethics approval and consent to participate Not applicable. Clinical trial number Not applicable. Consent for publication Not applicable. Competing interests The authors declare that they have no competing interests. Received: 2 August 2025 / Accepted: 6 January 2026 References 1. Sambasivam G, Opiyo GD. A predictive machine learning application in agriculture: cassava disease detection and clas- sification with imbalanced dataset using convolutional neural networks. Egypt Inform J. 2021;22(1):27–34. ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​ 1​0​.​1​0​1​6​/​j​.​e​i​j​.​2​0​2​0​.​0​2​.​0​0​7​​​​​.​​​ 2. Darra N, Anastasiou E, Kriezi O, Lazarou E, Kalivas D, Fountas S. Can yield prediction be fully digitilized? A systematic review. Agron. 2023;13(9):1–53. https://doi.org/10.3390/agronomy13092441. 3. Tende IG, Aburada K, Yamaba H, Katayama T, Okazaki N. Development and evaluation of a deep learning based system to predict district-level maize yields in Tanzania. Agric. 2023;13(3):1–19. https://doi.org/10.3390/agriculture13030627. 4. Dabija A, Ciocan ME, Chetrariu A, Codină GG. Maize and sorghum as raw materials for brewing, a review. Appl Sci. 2021. https://doi.org/10.3390/app11073139. 5. Chivasa W, Mutanga O, Biradar C. Application of remote sensing in estimating maize grain yield in heterogeneous African agricultural landscapes: a review. Int J Remote Sens. 2017;38(23):6816–45. ​h​t​t​p​s​:​​/​/​d​o​i​​.​o​r​g​/​1​​0​.​1​0​​8​0​/​0​1​​4​3​1​1​6​​1​.​2​0​1​7​​.​1​3​6​​5​3​9​ 0. 6. Mahesh P, Soundrapandiyan R. Yield prediction for crops by gradient-based algorithms. PLoS ONE. 2024;19(8):1–20. https://doi.org/10.1371/journal.pone.0291928. 7. Yewle AD, Mirzayeva L, and Karakuş O. Multi-modal data fusion and deep ensemble learning for accurate crop yield prediction. 2025, [Online]. Available: http://arxiv.org/abs/2502.06062 8. Khaki S, Wang L, Archontoulis SV. A CNN-RNN framework for crop yield prediction. Front Plant Sci. 2020;10:1–14. ​h​t​t​p​s​:​/​/​d​o​ i​.​o​r​g​/​1​0​.​3​3​8​9​/​f​p​l​s​.​2​0​1​9​.​0​1​7​5​0​​​​​.​​​ 9. Lobell DB, et al. Eyes in the sky, boots on the ground: assessing satellite- and ground-based approaches to crop yield measurement and analysis. Am J Agric Econ. 2020;102(1):202–19. https://doi.org/10.1093/ajae/aaz051. 10. Satpathi A, et al. Comparative analysis of statistical and machine learning techniques for rice yield forecasting for Chhat- tisgarh, India. Sustain. 2023;15(3):1–18. https://doi.org/10.3390/su15032786. 11. Aljahdali MO, Munawar S, Khan WR. Monitoring mangrove forest degradation and regeneration: landsat time series analy- sis of moisture and vegetation indices at Rabigh Lagoon, red sea. Forests. 2021;12(1):1–19. ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​.​3​3​9​0​/​f​1​2​0​1​0​ 0​5​2​​​​​.​​​ 12. Mohammad N, Islam MA, Rahman MM, Ahmed I, Mahboob G. Yield forecasting model for maize using satellite multispec- tral imagery driven vegetation indices. Qeios. 2023. https://doi.org/10.32388/coebsc. 13. Muruganantham P, Wibowo S, Grandhi S, Samrat NH, Islam N. A systematic literature review on crop yield prediction with deep learning and remote sensing. Remote Sens. 2022. https://doi.org/10.3390/rs14091990. 14. Ali AM, et al. Integrated method for rice cultivation monitoring using Sentinel-2 data and leaf area index. Egypt J Remote Sens Space Sci. 2021;24(3):431–41. https://doi.org/10.1016/j.ejrs.2020.06.007. 15. Yang W, et al. Estimation of corn yield based on hyperspectral imagery and convolutional neural network. Comput Elec- tron Agric. 2021. https://doi.org/10.1016/j.compag.2021.106092. 16. Nejad SMM, Abbasi-Moghadam D, Sharifi A, Farmonov N, Amankulova K, Laszlz M. Multispectral crop yield prediction using 3D-convolutional neural networks and attention convolutional LSTM approaches. IEEE J Sel Top Appl Earth Observ Remote Sens. 2023;16:254–66. https://doi.org/10.1109/JSTARS.2022.3223423. 17. Lu J, et al. Deep learning for multi-source data-driven crop yield prediction in northeast China. Agriculture. 2024;14(6):794. 18. Han Y, et al. Prediction of maize cultivar yield based on machine learning algorithms for precise promotion and planting. Agric For Meteorol. 2024. ​h​t​t​p​s​:​​/​/​d​o​i​​.​o​r​g​/​1​​0​.​1​0​​1​6​/​j​.​​a​g​r​f​o​​r​m​e​t​.​2​​0​2​4​.​​1​1​0​1​2​3. 19. Kumar D. Biographical notes: Priyanka received her Bachelor of Technology in Computer Science and Engineering (CSE) and Master of Technology in CSE from GJUS&T. Int J Inf Decis Sci. 2020;12(3):246–69. 20. Wang Y, Feng K, Sun L, Xie Y, Song XP. Satellite-based soybean yield prediction in Argentina: a comparison between panel regression and deep learning methods. Comput Electron Agric. 2024. https://doi.org/10.1016/j.compag.2024.108978. 21. Shahhosseini M, Hu G, Khaki S, Archontoulis SV. Corn yield prediction with ensemble CNN-DNN. Front Plant Sci. 2021;12:1– 13. https://doi.org/10.3389/fpls.2021.709008. https://doi.org/10.1016/j.eij.2020.02.007 https://doi.org/10.1016/j.eij.2020.02.007 https://doi.org/10.3390/agronomy13092441 https://doi.org/10.3390/agriculture13030627 https://doi.org/10.3390/app11073139 https://doi.org/10.1080/01431161.2017.1365390 https://doi.org/10.1080/01431161.2017.1365390 https://doi.org/10.1371/journal.pone.0291928 http://arxiv.org/abs/2502.06062 https://doi.org/10.3389/fpls.2019.01750 https://doi.org/10.3389/fpls.2019.01750 https://doi.org/10.1093/ajae/aaz051 https://doi.org/10.3390/su15032786 https://doi.org/10.3390/f12010052 https://doi.org/10.3390/f12010052 https://doi.org/10.32388/coebsc https://doi.org/10.3390/rs14091990 https://doi.org/10.1016/j.ejrs.2020.06.007 https://doi.org/10.1016/j.compag.2021.106092 https://doi.org/10.1109/JSTARS.2022.3223423 https://doi.org/10.1016/j.agrformet.2024.110123 https://doi.org/10.1016/j.compag.2024.108978 https://doi.org/10.3389/fpls.2021.709008 Page 27 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 22. Zhou W, et al. A prediction model of maize field yield based on the fusion of multitemporal and multimodal UAV data: a case study in Northeast China. Remote Sens. 2023. https://doi.org/10.3390/rs15143483. 23. Sun J, Lai Z, Di L, Sun Z, Tao J, Shen Y. Multilevel deep learning network for county-level corn yield estimation in the U.S. corn belt. IEEE J Sel Top Appl Earth Obs Remote Sens. 2020;13:5048–60. https://doi.org/10.1109/JSTARS.2020.3019046. 24. Harsányi E, et al. Data mining and machine learning algorithms for optimizing maize yield forecasting in central Europe. Agronomy. 2023;13(5):1–22. https://doi.org/10.3390/agronomy13051297. 25. Ma Y, Yang Z, Huang Q, Zhang Z. Improving the transferability of deep learning models for crop yield prediction: a partial domain adaptation approach. Remote Sens. 2023;15(18):4562. https://doi.org/10.3390/rs15184562. 26. Hu X, Chen S, and Zhang D. Domain adaptation in agricultural image analysis: a comprehensive review from shallow models to deep learning. pp. 1–24, 2025, [Online]. Available: http://arxiv.org/abs/2506.05972 27. UBOS. “Annual agricultural survey,” Report, no. 2, pp. 2–5, 2022, [Online]. Available: ​h​t​t​p​s​:​​/​/​e​u​r​​-​l​e​x​.​e​​u​r​o​p​​a​.​e​u​/​​l​e​g​a​l​​-​c​o​n​t​e​​n​t​/​ P​​T​/​T​X​T​​/​P​D​F​/​​?​u​r​i​=​C​​E​L​E​X​​:​3​2​0​1​​6​R​0​6​7​​9​%​2​6​f​r​​o​m​=​P​​T​%​0​A​h​​t​t​p​:​/​​/​e​u​r​-​l​​e​x​.​e​​u​r​o​p​a​​.​e​u​/​L​​e​x​U​r​i​S​​e​r​v​/​​L​e​x​U​r​​i​S​e​r​v​​.​d​o​?​u​r​​i​=​C​E​​L​E​X​:​5​2​0​1​ 2​P​C​0​0​1​1​:​p​t​:​N​O​T 28. UBOS. Statistical AbstractUganda Bureau of Statistics, Uganda Bur. Stat. Stat., pp. 1–336, 2022, [Online]. Available: ​h​t​t​p​:​/​​/​w​ w​w​.​​u​b​o​s​.​o​​r​g​/​o​​n​l​i​n​e​​f​i​l​e​s​​/​u​p​l​o​a​​d​s​/​u​​b​o​s​/​p​d​f documents/abstracts/Statistical Abstract 2013.pdf 29. Branco P, Ribeiro RP, Torgo L, Krawczyk B, Moniz N. SMOGN: a Pre-processing approach for imbalanced regression. Proc Mach Learn Res. 2017;74:36–50. 30. Ebrahimy H, Wang Y, Zhang Z. Utilization of synthetic minority oversampling technique for improving potato yield predic- tion using remote sensing data and machine learning algorithms with small sample size of yield data. ISPRS J Photo- gramm Remote Sens. 2023;201:12–25. ​h​t​t​p​s​:​​/​/​d​o​i​​.​o​r​g​/​1​​0​.​1​0​​1​6​/​j​.​​i​s​p​r​s​​j​p​r​s​.​2​​0​2​3​.​​0​5​.​0​1​5. 31. Bali N, Singla A. Emerging trends in machine learning to predict crop yield and study its influential factors: a survey. Arch Comput Methods Eng. 2022;29(1):95–112. https://doi.org/10.1007/s11831-021-09569-8. 32. Jafarbiglu H, Pourreza A. A comprehensive review of remote sensing platforms, sensors, and applications in nut crops. Comput Electron Agric. 2022;197:106844. https://doi.org/10.1016/j.compag.2022.106844. 33. Bassine FZ, Epule TE, Kechchour A, and Chehbouni A. Recent applications of machine learning, remote sensing, and iot approaches in yield prediction: a critical review,” 2023, [Online]. Available: http://arxiv.org/abs/2306.04566 34. Giovos R, Tassopoulos D, Kalivas D, Lougkos N, Priovolou A. Remote sensing vegetation indices in viticulture: a critical review. Agriculture. 2021. https://doi.org/10.3390/agriculture11050457. 35. Sun J, Di L, Sun Z, Shen Y, Lai Z. County-level soybean yield prediction using deep CNN-LSTM model. Sensors (Switzer- land). 2019;19(20):1–21. https://doi.org/10.3390/s19204363. 36. Di Y, Gao M, Feng F, Li Q, Zhang H. A new framework for winter wheat yield prediction integrating deep learning and Bayesian optimization. Agronomy. 2022;12(12):1–15. https://doi.org/10.3390/agronomy12123194. 37. Fathi M, Shah-Hosseini R, Moghimi A. 3D-ResNet-BiLSTM model: a deep learning model for county-level soybean yield prediction with time-series Sentinel-1, Sentinel-2 imagery, and Daymet data. Remote Sens. 2023;15(23):1–20. ​h​t​t​p​s​:​/​/​d​o​i​.​o​ r​g​/​1​0​.​3​3​9​0​/​r​s​1​5​2​3​5​5​5​1​​​​​.​​​ 38. Epule TE, Dhiba D, Etongo D, Peng C, Lepage L. Identifying maize yield and precipitation gaps in Uganda. SN Appl Sci. 2021;3(5):1–12. https://doi.org/10.1007/s42452-021-04532-5. 39. Joshi A, et al. An explainable Bi-LSTM model for winter wheat yield prediction. Front Plant Sci. 2024;15(January):1–17. https://doi.org/10.3389/fpls.2024.1491493. 40. Li L, et al. Improving the estimation of alfalfa yield based on multi-source satellite data and the synthetic minority overs- ampling strategy. Comput Electron Agric. 2025;236:110497. https://doi.org/10.1016/j.compag.2025.110497. 41. Elabd E, Hamouda HM, Ali MAM, Fouad Y. Climate change prediction in Saudi Arabia using a CNN GRU LSTM hybrid deep learning model in al Qassim region. Sci Rep. 2025;15(1):1–19. https://doi.org/10.1038/s41598-025-00607-0. 42. Thihlum Z and Khiangte C. Impact of SMOGN on regression models for crop yield prediction in mizoram agriculture impact of SMOGN on regression models for crop yield prediction in Mizoram agriculture,” no. May, 2025, ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​ .​1​0​0​7​/​9​7​8​-​3​-​0​3​1​-​8​8​0​3​9​-​1​​​​​​​ 43. Li ZZ, Huang N, Yi LZ, Fu GH. Affine combination-based over-sampling for imbalanced regression. J Chemom. 2024;38(3):1–22. https://doi.org/10.1002/cem.3537. 44. Cao Z and Zhang Z. Corn yield prediction based on remotely sensed variables using variational autoencoder and multiple instance regression. arXiv:2211.13286v1 [cs.CV]. pp. 1–5 45. El-Kenawy ESM, Alhussan AA, Khodadadi N, Mirjalili S, and Eid MM. Predicting potato crop yield with machine learning and deep learning for sustainable agriculture, vol. 68, no. 1. Springer Netherlands, 2025. ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​.​1​0​0​7​/​s​1​1​5​4​0​-​0​2​ 4​-​0​9​7​5​3​-​w​​​​​​​ 46. Kumar R, Lad YA, Kumari P. Forecasting potato prices in Agra: comparison of linear time series statistical vs. neural network models. Potato Res. 2025. https://doi.org/10.1007/s11540-024-09838-6. 47. Jadon A, Patil A, and Jadon S. A comprehensive survey of regression based loss functions for time series forecasting. 2022, [Online]. Available: http://arxiv.org/abs/2211.02989 48. Terven JR, Cordova-esparza DM, Ramirez-pedraza A, Chavez-urbiola EA, and Romero-gonzalez JA. L f m d l. pp. 1–76 49. Steurer M, Hill RJ, Pfeifer N, Hill RJ, Pfeifer N. Metrics for evaluating the performance of machine learning based automated valuation models based automated valuation models. J Prop Res. 2021;38(2):99–129. ​h​t​t​p​s​:​​/​/​d​o​i​​.​o​r​g​/​1​​0​.​1​0​​8​0​/​0​9​​5​9​9​9​1​​6​.​2​0​ 2​0​​.​1​8​5​​8​9​3​7. 50. Plevris V, Solorzano G, NP Bakas, and Ben Seghier MEA. Investigation of performance metrics in regression analysis and machine learning-based prediction models. World Congr Comput Mech ECCOMAS Congr, pp. 0–25, 2022, ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​ 1​0​.​2​3​9​6​7​/​e​c​c​o​m​a​s​.​2​0​2​2​.​1​5​5​​​​​​​ 51. Eldele E, Ragab M, Chen Z, Wu M, Li X. TSLANet: rethinking transformers for time series representation learning. Proc Mach Learn Res. 2024;235:12409–28. 52. Challenges P. Applied deep learning-based crop yield prediction : a systematic analysis of current developments and potential challenges. Technologies. 2024;12(4):43. 53. Schumacher BL, Burchfield EK, Bean B, Yost MA. Leveraging important covariate groups for corn yield prediction. Agric. 2023;13(3):1–18. https://doi.org/10.3390/agriculture13030618. 54. Berveglieri A, et al. Remote prediction of Soybean yield using UAV-based hyperspectral imaging and machine learning models. AgriEngineering. 2024;6(3):3242–60. ​h​t​t​p​s​:​​/​/​d​o​i​​.​o​r​g​/​1​​0​.​3​3​​9​0​/​a​g​​r​i​e​n​g​​i​n​e​e​r​i​​n​g​6​0​​3​0​1​8​5. https://doi.org/10.3390/rs15143483 https://doi.org/10.1109/JSTARS.2020.3019046 https://doi.org/10.3390/agronomy13051297 https://doi.org/10.3390/rs15184562 http://arxiv.org/abs/2506.05972 https://eur-lex.europa.eu/legal-content/PT/TXT/PDF/?uri=CELEX:32016R0679%26from=PT%0Ahttp://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:52012PC0011:pt:NOT https://eur-lex.europa.eu/legal-content/PT/TXT/PDF/?uri=CELEX:32016R0679%26from=PT%0Ahttp://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:52012PC0011:pt:NOT https://eur-lex.europa.eu/legal-content/PT/TXT/PDF/?uri=CELEX:32016R0679%26from=PT%0Ahttp://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:52012PC0011:pt:NOT http://www.ubos.org/onlinefiles/uploads/ubos/pdf http://www.ubos.org/onlinefiles/uploads/ubos/pdf https://doi.org/10.1016/j.isprsjprs.2023.05.015 https://doi.org/10.1007/s11831-021-09569-8 https://doi.org/10.1016/j.compag.2022.106844 http://arxiv.org/abs/2306.04566 https://doi.org/10.3390/agriculture11050457 https://doi.org/10.3390/s19204363 https://doi.org/10.3390/agronomy12123194 https://doi.org/10.3390/rs15235551 https://doi.org/10.3390/rs15235551 https://doi.org/10.1007/s42452-021-04532-5 https://doi.org/10.3389/fpls.2024.1491493 https://doi.org/10.1016/j.compag.2025.110497 https://doi.org/10.1038/s41598-025-00607-0 https://doi.org/10.1007/978-3-031-88039-1 https://doi.org/10.1007/978-3-031-88039-1 https://doi.org/10.1002/cem.3537 http://arxiv.org/abs/hep-th/2211.13286v1 https://doi.org/10.1007/s11540-024-09753-w https://doi.org/10.1007/s11540-024-09753-w https://doi.org/10.1007/s11540-024-09838-6 http://arxiv.org/abs/2211.02989 https://doi.org/10.1080/09599916.2020.1858937 https://doi.org/10.1080/09599916.2020.1858937 https://doi.org/10.23967/eccomas.2022.155 https://doi.org/10.23967/eccomas.2022.155 https://doi.org/10.3390/agriculture13030618 https://doi.org/10.3390/agriengineering6030185 Page 28 of 28Taremwa et al. Discover Artificial Intelligence (2026) 6:164 55. Yu L, et al. Near surface camera informed agricultural land monitoring for climate smart agriculture. Clim Smart Agric. 2024;1(1):100008. https://doi.org/10.1016/j.csag.2024.100008. 56. Kenduiywo BK, Miller S. Seasonal maize yield forecasting in South and East African countries using hybrid Earth observa- tion models. Heliyon. 2024;10(13):e33449. ​h​t​t​p​s​:​​/​/​d​o​i​​.​o​r​g​/​1​​0​.​1​0​​1​6​/​j​.​​h​e​l​i​y​​o​n​.​2​0​2​​4​.​e​3​​3​4​4​9. 57. Tsang TK, Du Q, Cowling BJ, Viboud C. An adaptive weight ensemble approach to forecast influenza activity in an irregular seasonality context. Nat Commun. 2024;15(1):1–12. https://doi.org/10.1038/s41467-024-52504-1. 58. Ma Y, Zhang Z, Kang Y, Özdoğan M. Corn yield prediction and uncertainty analysis based on remotely sensed variables using a Bayesian neural network approach. Remote Sens Environ. 2021. https://doi.org/10.1016/j.rse.2021.112408. 59. Bouni M, Hssina B, Douzi K, Douzi S. Interpretable machine learning techniques for an advanced crop recommendation model. J Electr Comput Eng. 2024. https://doi.org/10.1155/2024/7405217. 60. Habibi LN, Matsui T, Tanaka TST. Critical evaluation of the effects of a cross-validation strategy and machine learning optimization on the prediction accuracy and transferability of a soybean yield prediction model using UAV-based remote sensing. J Agric Food Res. 2024;16:101096. https://doi.org/10.1016/j.jafr.2024.101096. Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. https://doi.org/10.1016/j.csag.2024.100008 https://doi.org/10.1016/j.heliyon.2024.e33449 https://doi.org/10.1038/s41467-024-52504-1 https://doi.org/10.1016/j.rse.2021.112408 https://doi.org/10.1155/2024/7405217 https://doi.org/10.1016/j.jafr.2024.101096 Prediction of maize yield in Uganda using CNN-LSTM architecture on a multimodal climate and remote sensing dataset Abstract Article highlights 1 Introduction 2 Related literature 2.1 Transformative impact of artificial intelligence and remote sensing in agriculture 2.2 Deep learning and multisource datasets for crop yield prediction 3 Materials and methods 3.1 Dataset description 3.2 Data pre-processing 3.2.1 SMOGN oversampling for imbalanced yield regression data 3.2.2 Application and limitations of SMOGN oversampling 3.3 The proposed model 3.3.1 Hyperparameter tuning strategy 3.3.2 Model architectures and final configurations 3.4 Performance metrics 3.5 Model implementation 4 Experimental results 4.1 Performance of the proposed model 4.2 Comparison between the actual maize yields and the predictions of different models 4.3 Comparison of CNN, random forest, CNN-LSTM, and CNN-random forest 4.4 Predicted values of maize yield in tonnes per hectare for various models 4.5 The relationship between the predicted maize yields and the actual maize yields 4.6 The learning curves of the proposed CNN-LSTM model 4.7 The feature importance scores 4.8 The box plot of residuals for the proposed CNN-LSTM model 5 Discussion of results 5.1 Limitations and future work 6 Conclusion References