Skip to the content.

Imports & Installation

%pip install pandas scikit-learn xgboost matplotlib seaborn
Requirement already satisfied: pandas in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (2.2.3)
Requirement already satisfied: scikit-learn in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (1.6.1)
Requirement already satisfied: xgboost in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (2.1.4)
Requirement already satisfied: matplotlib in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (3.9.4)
Requirement already satisfied: seaborn in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (0.13.2)
Requirement already satisfied: numpy>=1.22.4 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from pandas) (2.0.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from pandas) (2025.2)
Requirement already satisfied: scipy>=1.6.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from scikit-learn) (1.13.1)
Requirement already satisfied: joblib>=1.2.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from scikit-learn) (3.6.0)
Requirement already satisfied: contourpy>=1.0.1 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (4.56.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (24.2)
Requirement already satisfied: pillow>=8 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (3.2.1)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from matplotlib) (6.5.2)
Requirement already satisfied: zipp>=3.1.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib) (3.21.0)
Requirement already satisfied: six>=1.5 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)

[notice] A new release of pip is available: 24.1.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
%pip install --upgrade xgboost scikit-learn pandas cython --quiet

[notice] A new release of pip is available: 24.1.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
%pip install shap --quiet

[notice] A new release of pip is available: 24.1.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
%pip install folium
Collecting folium
  Downloading folium-0.19.5-py2.py3-none-any.whl.metadata (4.1 kB)
Collecting branca>=0.6.0 (from folium)
  Downloading branca-0.8.1-py3-none-any.whl.metadata (1.5 kB)
Requirement already satisfied: jinja2>=2.9 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from folium) (3.1.4)
Requirement already satisfied: numpy in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from folium) (2.0.2)
Requirement already satisfied: requests in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from folium) (2.32.3)
Collecting xyzservices (from folium)
  Downloading xyzservices-2025.1.0-py3-none-any.whl.metadata (4.3 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from jinja2>=2.9->folium) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from requests->folium) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from requests->folium) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from requests->folium) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages (from requests->folium) (2024.8.30)
Downloading folium-0.19.5-py2.py3-none-any.whl (110 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110.9/110.9 kB 1.2 MB/s eta 0:00:00a 0:00:01m
[?25hDownloading branca-0.8.1-py3-none-any.whl (26 kB)
Downloading xyzservices-2025.1.0-py3-none-any.whl (88 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 88.4/88.4 kB 2.6 MB/s eta 0:00:00
[?25hInstalling collected packages: xyzservices, branca, folium
Successfully installed branca-0.8.1 folium-0.19.5 xyzservices-2025.1.0

[notice] A new release of pip is available: 24.1.2 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
# from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

Data Exploration

Climate & Weather Similarity: Southern California vs Portugal

Both Southern California and Portugal have a Mediterranean climate (Köppen Csa), but there are some key nuances. Here’s a structured comparison:


Similarities

Feature Southern California Portugal Similarity
Climate Type Mediterranean (Csa) Mediterranean (Csa) ✅ Very similar
Dry Summers Very dry, especially inland Dry, especially in the south ✅ Very similar
Wet Winters Mildly wet winters Wettest season is also winter ✅ Similar
Plenty of Sun 280–330 sunny days/year 250–300 sunny days/year ✅ Very similar
Moderate Humidity Usually low to moderate Low to moderate, coastal breezes ✅ Similar
Wildfire Risk High in summer/fall Some risk in central/southern Portugal ⚠️ Slight difference in scale

Key Differences

Factor Southern California Portugal
Temperature Range More extreme inland (e.g., Palm Springs) More moderated overall due to Atlantic
Rainfall Less rainfall (~15 in/year) More rainfall (~20–40 in/year in most areas)
Ocean Influence Cold Pacific Current Warm Atlantic Current
Seasonal Shift Longer, drier summers More defined spring/fall transitions
Coastal Humidity Drier (e.g., Santa Barbara) More humid near coast (esp. Lisbon, Porto)

Overall Similarity Rating: 8.5/10

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv"
df = pd.read_csv(url)

print(df.head())
print(df.info())
print(df.describe())
   X  Y month  day  FFMC   DMC     DC  ISI  temp  RH  wind  rain  area
0  7  5   mar  fri  86.2  26.2   94.3  5.1   8.2  51   6.7   0.0   0.0
1  7  4   oct  tue  90.6  35.4  669.1  6.7  18.0  33   0.9   0.0   0.0
2  7  4   oct  sat  90.6  43.7  686.9  6.7  14.6  33   1.3   0.0   0.0
3  8  6   mar  fri  91.7  33.3   77.5  9.0   8.3  97   4.0   0.2   0.0
4  8  6   mar  sun  89.3  51.3  102.2  9.6  11.4  99   1.8   0.0   0.0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 517 entries, 0 to 516
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   X       517 non-null    int64  
 1   Y       517 non-null    int64  
 2   month   517 non-null    object 
 3   day     517 non-null    object 
 4   FFMC    517 non-null    float64
 5   DMC     517 non-null    float64
 6   DC      517 non-null    float64
 7   ISI     517 non-null    float64
 8   temp    517 non-null    float64
 9   RH      517 non-null    int64  
 10  wind    517 non-null    float64
 11  rain    517 non-null    float64
 12  area    517 non-null    float64
dtypes: float64(8), int64(3), object(2)
memory usage: 52.6+ KB
None
                X           Y        FFMC         DMC          DC         ISI  \
count  517.000000  517.000000  517.000000  517.000000  517.000000  517.000000   
mean     4.669246    4.299807   90.644681  110.872340  547.940039    9.021663   
std      2.313778    1.229900    5.520111   64.046482  248.066192    4.559477   
min      1.000000    2.000000   18.700000    1.100000    7.900000    0.000000   
25%      3.000000    4.000000   90.200000   68.600000  437.700000    6.500000   
50%      4.000000    4.000000   91.600000  108.300000  664.200000    8.400000   
75%      7.000000    5.000000   92.900000  142.400000  713.900000   10.800000   
max      9.000000    9.000000   96.200000  291.300000  860.600000   56.100000   

             temp          RH        wind        rain         area  
count  517.000000  517.000000  517.000000  517.000000   517.000000  
mean    18.889168   44.288201    4.017602    0.021663    12.847292  
std      5.806625   16.317469    1.791653    0.295959    63.655818  
min      2.200000   15.000000    0.400000    0.000000     0.000000  
25%     15.500000   33.000000    2.700000    0.000000     0.000000  
50%     19.300000   42.000000    4.000000    0.000000     0.520000  
75%     22.800000   53.000000    4.900000    0.000000     6.570000  
max     33.300000  100.000000    9.400000    6.400000  1090.840000  

Setting up Labels, Features, and Data Splits

df['month'] = df['month'].astype('category').cat.codes
df['day'] = df['day'].astype('category').cat.codes

X = df.drop('area', axis=1)
y = df['area']

y_log = np.log1p(y)

X_train, X_test, y_train, y_test = train_test_split(X, y_log, test_size=0.2, random_state=42)

Building RF (Random Forest Model)

rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
rf_preds = rf.predict(X_test)
def evaluate(y_true, y_pred, model_name):
    print(f"--- {model_name} ---")
    print(f"R² Score: {r2_score(y_true, y_pred):.3f}")
    print(f"MAE: {mean_absolute_error(y_true, y_pred):.3f}")
    print(f"RMSE: {np.sqrt(mean_squared_error(y_true, y_pred)):.3f}\n")

evaluate(y_test, rf_preds, "Random Forest")
--- Random Forest ---
R² Score: -0.057
MAE: 1.218
RMSE: 1.525

Visualizations

feat_importances = pd.Series(rf.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
plt.title("Random Forest Feature Importances")
plt.show()

png

plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title("Feature Correlation Heatmap")
plt.show()

png

import numpy as np

y_pred = rf.predict(X_test)
y_test_exp = np.expm1(y_test)
y_pred_exp = np.expm1(y_pred)

plt.figure(figsize=(8, 6))
sns.scatterplot(x=y_test_exp, y=y_pred_exp, alpha=0.7)
plt.plot([0, max(y_test_exp)], [0, max(y_test_exp)], 'r--')  # perfect line
plt.xlabel("Actual Burned Area (ha)")
plt.ylabel("Predicted Burned Area (ha)")
plt.title("Predicted vs Actual Burned Area")
plt.grid(True)
plt.show()

png

errors = y_test_exp - y_pred_exp
sns.histplot(errors, bins=30, kde=True, color='darkorange')
plt.title("Prediction Error Distribution")
plt.xlabel("Prediction Error (ha)")
plt.show()

png

import shap
explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)
/Users/nikhilmaturi/nighthawk/nikhil_2025/venv/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

png

Feature (FFMC, DMC, DC, ISI) Modelling

def estimate_FFMC(temp, rh, wind, rain):
    return max(18, min(101, 0.5 * temp - 0.3 * rh + 0.4 * wind - 0.5 * rain + 85))
def estimate_DMC(temp, rh, rain):
    return max(0, 0.6 * temp - 0.35 * rh - 1.5 * rain + 33)
def estimate_DC(temp, rain):
    return max(0, 0.5 * temp - 2 * rain + 100)
def estimate_ISI(ffmc, wind):
    return max(0, 0.208 * ffmc * wind / (90 + wind))

Test Implementation

weather_data = {
    "temp": 27.0,     # °C
    "RH": 45,         # relative humidity %
    "wind": 12.0,     # km/h
    "rain": 0.0,      # mm
    "month": "aug",
    "day": "fri",
    "X": 4,
    "Y": 4
}

FFMC = estimate_FFMC(weather_data["temp"], weather_data["RH"], weather_data["wind"], weather_data["rain"])
DMC = estimate_DMC(weather_data["temp"], weather_data["RH"], weather_data["rain"])
DC = estimate_DC(weather_data["temp"], weather_data["rain"])
ISI = estimate_ISI(FFMC, weather_data["wind"])
month_map = {'jan':0,'feb':1,'mar':2,'apr':3,'may':4,'jun':5,'jul':6,'aug':7,'sep':8,'oct':9,'nov':10,'dec':11}
day_map = {'mon':0,'tue':1,'wed':2,'thu':3,'fri':4,'sat':5,'sun':6}

input_vector = pd.DataFrame([{
    "X": weather_data["X"],
    "Y": weather_data["Y"],
    "month": month_map[weather_data["month"].lower()],
    "day": day_map[weather_data["day"].lower()],
    "FFMC": FFMC,
    "DMC": DMC,
    "DC": DC,
    "ISI": ISI,
    "temp": weather_data["temp"],
    "RH": weather_data["RH"],
    "wind": weather_data["wind"],
    "rain": weather_data["rain"]
}])

pred_log = rf.predict(input_vector)[0]
predicted_area = np.expm1(pred_log)  # reverse log1p

print(f"Predicted Burned Area: {predicted_area:.3f} hectares")
Predicted Burned Area: 3.167 hectares
import math

def hectares_to_radius_m(hectares):
    area_m2 = hectares * 10000
    radius = math.sqrt(area_m2 / math.pi)
    return radius
import folium

fire_lat = 41.8
fire_lon = -7.5


predicted_area_ha = 3.167 # fire size from model prediction in hectares
radius_m = hectares_to_radius_m(predicted_area_ha)

fire_map = folium.Map(location=[fire_lat, fire_lon], zoom_start=12)

folium.Circle(
    location=[fire_lat, fire_lon],
    radius=radius_m,
    popup=f"{predicted_area_ha:.1f} ha estimated fire spread",
    color="red",
    fill=True,
    fill_opacity=0.4
).add_to(fire_map)

fire_map

Make this Notebook Trusted to load map: File -> Trust Notebook