Crop Yield Prediction — Pratik Ghimire

Overview

The Problem

Nepal's agricultural sector feeds over 65% of the population, yet farmers and policymakers have almost no data-driven tools to anticipate crop yield fluctuations. Climate variability — erratic monsoons, temperature shifts, unexpected frost — makes season-to-season yield prediction genuinely difficult.

This project built a full machine learning pipeline that predicts district-level crop yields across Nepal by combining historical agricultural statistics with meteorological data from NASA's POWER satellite system. The goal: give farmers and agricultural planners a tool that actually works with Nepal's limited data infrastructure.

Methodology

Pipeline Architecture

01

Data Collection

MoALD + NASA POWER API

02

Preprocessing

Merge, encode, impute, scale

03

Feature Eng.

Log & Yeo-Johnson transforms

04

Model Training

7 regression algorithms

05

Deployment

Streamlit web app

Data

Data Sources

MoALD Nepal

Ministry of Agriculture and Livestock Development — national crop statistics

District-level yield data
Crop area harvested
Crop type labels
Multi-year historical records

NASA POWER API

Prediction of Worldwide Energy Resources — satellite-derived meteorological data

Temperature (min/max/avg)
Precipitation
Solar radiation
Relative humidity

Code

Key Implementation

The core model evaluation loop — all 8 regression algorithms evaluated on both log-scale and original scale, with the target variable log1p-transformed before training:

Python

import numpy as np
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score

# Target: log1p transformed for better regression performance
y = np.log1p(df_encoded['h/ha_yield'])

# Evaluate each model on log-scale and original scale
def evaluate_model(model, name, X_train, X_test, y_train, y_test):
    model.fit(X_train, y_train)
    y_pred_log = model.predict(X_test)

    # Convert back to original scale
    y_pred_orig = np.expm1(y_pred_log)
    y_test_orig = np.expm1(y_test)

    r2  = r2_score(y_test_orig, y_pred_orig)
    mse = mean_squared_error(y_test_orig, y_pred_orig)
    return r2, mse

models = [
    ("Linear Regression",          LinearRegression()),
    ("Ridge Regression",            Ridge()),
    ("Lasso Regression",            Lasso()),
    ("Decision Tree",               DecisionTreeRegressor(random_state=42)),
    ("Random Forest",               RandomForestRegressor(random_state=42)),
    ("Gradient Boosting",           GradientBoostingRegressor(random_state=42)),
    ("Support Vector Regression",   SVR()),
]
# XGBoost trained separately using xgb.train() with early stopping
# Best result → R² = 0.8175, MSE = 0.2031

Results

Model Comparison

All models evaluated on a held-out 20% test set. Metrics computed on the original scale (after inverse log1p transform). XGBoost achieved the best R² and lowest MSE:

Model	R² Score	MSE	Notes
XGBoost BEST	0.8175	0.2031	Gradient boosting with early stopping, handles non-linearity best
Random Forest	0.7879	0.2360	Strong ensemble baseline, slightly below XGBoost
Gradient Boosting	0.7422	0.2869	Good but slower convergence than XGBoost
Decision Tree	0.6285	0.4135	Prone to overfitting without pruning
SVR	0.4784	0.5805	Sensitive to feature scaling, limited on tabular data
Linear Regression	0.3907	0.6781	Misses non-linear crop-weather interactions
Ridge Regression	0.3906	0.6782	Marginal improvement over Linear Regression
Lasso Regression	-0.0001	1.1130	Over-regularized — collapsed predictions

Conference

Research Output

📄

Meteorology-Driven Crop Yield Prediction in Nepal: A Regression Approach

Presented at the International Conference on Recent Trends in Artificial Intelligence · ICRTAI 2025

The paper presents the full methodology, dataset construction, feature engineering decisions, and model evaluation framework. It contextualizes the work within Nepal's agricultural data scarcity and argues for satellite-derived meteorological inputs as a scalable alternative to ground station networks.

Presenting at ICRTAI 2025

Deployment

Live Web Application

The final XGBoost model was wrapped in a Streamlit web application that lets users input district, crop type, and meteorological parameters to get a predicted yield. Designed to be accessible to agricultural officers and researchers without requiring any coding knowledge.

crop-yield-prediction-nepal-nhugtaob97vm7rnp2mj9be.streamlit.app

🌾

Crop Yield Prediction — Nepal

Input district + crop type + weather parameters → get predicted yield

Open Live App →

XGBoost Python Streamlit NASA POWER MoALD Data Scikit-learn Pandas NumPy Yeo-Johnson Transform Google Colab

Crop Yield Predictionin Nepal

MoALD Nepal

NASA POWER API

Meteorology-Driven Crop Yield Prediction in Nepal: A Regression Approach

Crop Yield Prediction
in Nepal