The Call for Code Spot Challenge for Wildfires

Predicting Wildfires in Australia

Authors: Marco De Ieso, Andrea Ongaro, Giulia Plotti, Guglielmo Sanchini

Wildfires are among the most common forms of natural disaster in some regions, including Siberia, California, and Australia. It is important to improve forecasting for wildfires for several reasons: to prepare and respond, to understand the root causes, to help mitigate wildfires in the future. To address this issue, many research efforts have been conducted to monitor, predict, and prevent wildfires using ML techniques with remote sensing data [1–11].

“A firefighter walks past burning trees during a battle against bushfires around the town of Nowra in the Australian state of New South Wales on December 31, 2019.” Photo credit: Saeed Khan / AFP via Getty Images

In October 2020, IBM proposed the first Data Science Challenge open to internal and external participants. The aim of the challenge is to predict the size of the fire area in km squared, in each of the 7 Australian regions, for each day of February 2021, using data available up to January 29th. In this article, we discuss our approach for developing the solution detailing an effective learning process, as well as some relevant activities for each phase of the CRISP-DM methodology. Finally, we describe the results achieved and potential improvements to our approach.

Frame the problem

The Call for Code Spot Challenge for Wildfires required that teams predict wildfires in Australia by region and by day for February 2021, using 5 datasets from PAIRS Geoscope and Earthdata NASA:

  • Historical Wildfires: datum of wildfire extensions per region at a daily rate from January 2005 to January 29th 2021
  • Historical weather: weather parameters such as temperature, soil temperature, humidity, precipitation, wind speed at a daily rate per region since January 2005
  • Historical weather forecasts at 5/10/15 days, containing the same parameters as the historical weather dataset, since June 2005
  • Historical Vegetation Index: the vegetation index datum aggregated per region on a monthly basis from January 2005

Weather forecast being available at most up to 15 days in advance, it is possible to predict the first half of February 2021 by using climate data, whereas this information Is not available for the second half of the month. Thus, the modeling strategy should consider this evidence.

Link to the GitHub repository:


The study can be divided into two major phases.

In the first phase of the study, we did a systematic review of published papers on the subject to get familiar with the domain and understand which data and which ML techniques have been adopted. From meteorological data it is possible to compute parameters which are proven to be related to wildfires such as the drought index, the standard precipitation index and the evapotranspiration index. From satellite images it is possible to compute another class of parameters which are related to wildfires such as the normalized vegetation index, the wood index, the leaf area index.

Process diagram showing the different phases of CRISP-DM

In the second phase of the study, we develop our solution by following the guidelines provided by the CRISP-DM framework.

Data understanding

The analysis of the literature [1–11] has shown that fires may be due to factors related to two types of variables: endogenous — all the factors that allow to characterize the wildfires in a given geographical area — and exogenous — mainly linked to weather conditions and vegetations.

During the data understanding phase, we tried to analyze the available data and verify our hypothesis, to highlight the relationship between the variables and our target.

We started by analyzing endogenous variables.

Firstly, the comparison between the frequency of wildfires in different regions highlighted the presence of two clusters. Queensland, Northern Territory, New South Wales, Western Australia historically experienced more fires phenomena than in the rest of Australia. This implies that it is advisable to create a forecast model for each individual region.

Heatmap of the number of days in which the extension of land interested by a wildfire has a size larger than 1

Secondly, we tested the hypothesis that there might be a correlation between bordering regions. After all, wildfires don’t know political boundaries. The Granger test of causality evaluates whether the time series of a certain event is related to the time series of another event. In our case, the test compared the series of past wildfires for all combinations of regions. P-values closed to zero support the hypothesis of a mutual influence between bordering regions events.

Lastly, we evaluated whether our target value — the fire extension on a given day — is dependent on the value of fire in previous days. That is, we evaluated the autoregressive component. Since the dots — which represent all historical observations for a given region, are arranged along the bisector of the cartesian plane, it’s reasonable to assume that the autoregressive component plays an important role in predicting wildfires.

Scatterplot of Estimated_fire_area vs the Mean of the past 7 days for two regions: Northern Territory and Western Australia

Moving to exogenous variables, we made the hypothesis that weather and vegetation would be among the factors that contribute to wildfires. On the one hand, high temperatures, low precipitations, and humidity levels contribute to the presence of drought, especially during the so-called “bushfire season” — which runs from October to February. On the other, the presence of dry vegetation fuels the spread of wildfires — both natural and arson, resulting in wider areas of the country being interested by fires. By analyzing historical data, we observed that both considerations held true for Australia’s wildfires.

Drought can be expressed as the ratio between the number of days with at least 2mm/m² of rain within a time window (we chose 60 days for our models) and the total amount of precipitations in the same period. Plotting the drought index for a given day against the measured fire area in the same day, we notice that peaks in wildfires extension occur in periods of high dryness. Many drought indexes have been proposed by the literature and each of them can be customized with varying parametrizations. In our model, we included measures such as PET and SPI and we relied on a Python library designed exactly for computing weather indexes (

Comparison between the time series of the fire area and of a drought index for two regions: Queensland and Victoria

Data Preparation

Our data preparation phase is composed by three steps:

  • Definition of the perimeter of analysis.
  • Data Preprocessing
  • Partitioning strategy

The perimeter definition is strongly influenced by the data understanding phase and it defines how the next phases are build, as it impacts how the mining table is derived and the number of models that have to be developed. Since each region shows a different wildfire historical profile, we decided to create a dataset for each couple (T, R) where T is the number of days in advance of the prediction and R is the Region. We ended up with 28 (the days of February) * 7 (the number of regions) mining table.

Moving to data pre-processing, we can identify 4 classes of variables.

The first one is about the autoregressive component. From the historical values of the target, we extracted lag features and rolling functions (min, max, mean, standard deviation).

The second class models the weather impact on the target. Here, after computing two drought indices (SPI and PET), we derived lag and rolling variables from them. Since we had the weather forecast only for 15 days ahead, but needed to predict 32 days ahead, we used the mean value of each weather measure for the same day of the previous years to estimate the value for the remaining days.

The third class is seasonality. We only built two variables: Month and day of the year.

Finally, the vegetation index: after transforming it from a monthly to a daily value, we computed some lag indicators.

All those features are merged to obtain the final mining table, which contained more than 200 variables. By applying principal component analysis, we were able to decrease the dimensionality, reducing collinearity and without information loss.

The split in training and test has been made by taking in consideration that we were evaluated for February 2021. Given that, we decided to use as test set the last three February available (2018, 2019, 2020) and January 2021.

Modelling & Evaluation

Modelling and evaluation are strongly interconnected. The combined iteration of both is the best way to find the finest solution, such as defining the best model, identifying the optimal combination of features and the optimal parameters. Due to the nature of the mining tables we built, we did not adopt a traditional time series approach but instead a more complex supervised approach.

Moreover, given the complexity of the problem, we didn’t rely on one model, but on different tree-based models to generate our prediction: XGBoost, LightGBM and Random Forest.

We finally settled on creating an ensemble of 4 different models, for each given lead time and region. We used two methods to combine the partial results to maximize the precision of our final submissions:

  • Taking the geometric mean of the two central predictions (after sorting them in ascending order)
  • Combining by looking at the results obtained for January 2021.

To understand the second ensembling method, let’s look at the plots of the error metric (called tot) by number of days of lag in the regions QL and NT for January 2021. Each colored line represents a model, which in turn is the union of 28 different models (one for each day of lag), and we use these plots to ensemble the models, selecting their most favorable combination for each number of days of lag, i.e., the one producing the lowest tot error possible.

Competition’s metric on January 2021 for each lag compared between our four models (Regions: Queensland and Northern Territory)

The evaluation phase is dedicated to assessing the performance of the predictive model, which depends on the chosen metrics which in turn depends on the business problem one is dealing with.

In this case, the competition metric is a weighted average of the mean absolute error (MAE) and the root mean squared error (RMSE), with weights 0.8 and 0.2 respectively.

Different Approaches

On top of the presented approach, we experimented different modeling techniques aimed at improving our performances. In particular, we applied:

  • A VAR (Vector AutoRegression) model, using the fire series of the territory we were trying to predict together with the bordering territories that Granger-caused it;
  • An LSTM (Long short-term memory) neural network with multiple parallel input (each territory) and multi-step output, predicting the whole month of February 2021.

Unfortunately, both models performed worse than the submitted ensemble models.

We mention here also different approaches that we didn’t apply, but that potentially could have improved our models. Concerning the feature selection phase, Recursive Feature Elimination, a simple Variance Threshold or SelectKBest could have been used in place of PCA. Finally, moving to ensembling, alternative strategies that we considered implementing are:

  • “Stacking” one, i.e., using the predictions of the four models as input for a second model;
  • “Optimal Weights” one, i.e., computing a list of optimal weights on the training set, which could then have been used to compute a weighted average of the four models.


The implemented method led to a final result of 9.54. The following plot compares the predictions and actuals for February 2021 by region. Our predictions correctly capture the trend of actual wildfires, but aren’t able to predict peaks, i.e., when the difference computed in two subsequent days is high.

There are two possible reasons for that:

  • Ensembling strategy: the prediction is computed as the geometric mean between the predictions
  • Data granularity: wildfires are a local phenomenon. Then, data on a finer local grid should be considered if one wants to obtain more accurate predictions.
Actual vs Predicted on February 2021

If, instead, we look at the contribution of the single models before the ensembling, we notice that for the first half of February, one of the four models (in green) reproduces the correct profile of the actual value. This happens for two regions, namely New South Wales and Queensland.

Comparison between actual, predicted and the results obtained with model before ensembling (New South Wales and Queensland)


This article gives a flavor of the process of analysis that we implemented for solving the Call for Code Spot Challenge for Wildfires. After a systematic review of published papers on the subject, we explained how to explore the data, perform in-depth analysis, define, and implement the strategy of constructing the perimeter of analysis, construct relevant indicators and model and evaluate the results. In the end, we highlight new possible lines of analysis to improve the accuracy of the prediction by using more granular data.


[1] Agarwal, P., et al. “Big Data and Predictive Analytics in Fire Risk Using Weather Data.” Risk Analysis, vol. 40, no. 7, 2020, pp. 1438–49, doi://

Drought Indexes Related to Fire.

[2] Ghorbanzadeh, Omid, et al. “Spatial Prediction of Wildfire Susceptibility Using Field Survey GPS Data and Machine Learning Approaches.” Fire, vol. 2, no. 3, July 2019, p. 43, doi:10.3390/fire2030043.

[3] Keeley, Jon E. “Fire Intensity, Fire Severity and Burn Severity: A Brief Review and Suggested Usage.” International Journal of Wildland Fire, vol. 18, no. 1, 2009, p. 116, doi:10.1071/WF07049.

[4] Leuenberger, Michael, et al. “Wildfire Susceptibility Mapping: Deterministic vs. Stochastic Approaches.” Environmental Modelling & Software, vol. 101, 2018, pp. 194–203, doi:10.1016/j.envsoft.2017.12.019.

[5] Liang, H., et al. “A Neural Network Model for Wildfire Scale Prediction Using Meteorological Factors.” IEEE Access, vol. 7, 2019, pp. 176746–55, doi:10.1109/ACCESS.2019.2957837.

[6] Ma, Jun. “Real-Time Detection of Wildfire Risk Caused by Powerline Vegetation Faults Using Advanced Machine Learning Techniques.” J. Ma, 2020, p. 9.

[7] Michael, Yaron, et al. “Forecasting Fire Risk with Machine Learning and Dynamic Information Derived from Satellite Vegetation Index Time-Series.” Science of The Total Environment, 2020, p. 142844, doi:10.1016/j.scitotenv.2020.142844.

[8] Preisler, Haiganoush K., and Anthony L. Westerling. “Statistical Model for Forecasting Monthly Large Wildfire Events in Western United States.” Journal of Applied Meteorology and Climatology, vol. 46, no. 7, July 2007, pp. 1020–30, doi:10.1175/JAM2513.1.

[9] Rodrigues, Marcos. “An Insight into Machine-Learning Algorithms to Model Human-Caused Wildfire Occurrence.” Environmental Modelling, 2014, p. 10.

[10] Sayad, Younes Oulad. “Predictive Modeling of Wildfires_ A New Dataset and Machine Learning Approach.” Fire Safety Journal, 2019, p. 17.

[11] Tonini, Marj, et al. “A Machine Learning-Based Approach for Wildfire Susceptibility Mapping. The Case Study of the Liguria Region in Italy.” Geosciences, vol. 10, no. 3, Mar. 2020, p. 105, doi:10.3390/geosciences10030105.

Other references to the challenge