Technical Deep Dive

 

This section provides an in-depth look at the technology and methodology behind Forest Foresight, offering insights into our tech stack, data sources, preprocessing techniques, and machine learning approach.

Tech Stack

16.jpg

Our technology stack is designed to provide robust backend processing with versatile frontend options:

Frontend:

  • ArcGIS Online Dashboards: Primary visualization tool

  • WRI Map Builder App: Enhanced visualization capabilities

  • Web Interface: User-friendly access for custom area analysis

  • Open Access Data Repository: Direct data access for researchers and developers

Backend:

  • Server/Computer running the ForestForesight R package

  • AWS S3 Bucket: Cloud storage for prediction data

This architecture allows for scalable processing and flexible data access, catering to a wide range of user needs.

 

Data Integration and Prediction

17.jpg

Forest Foresight combines multiple data sources to create accurate deforestation predictions:

  1. Near Real-Time Deforestation Data:

    • Optical satellite imagery

    • Radar satellite data

  2. Contextual Data:

    • Forest height

    • Distance to oil palm mills

    • Distance to roads

    • Elevation and slope

    • Agricultural potential

    • Various other relevant factors

This diverse dataset is fed into our XGBoost algorithm, which is trained on historical data with a 6-month gap. This approach allows the model to learn patterns that predict deforestation 6 months into the future.

Data Sources

18.jpg

Forest Foresight utilizes a wide array of data sources to ensure comprehensive and accurate predictions. A detailed Excel spreadsheet is available, listing all resources used in the model. This includes satellite imagery providers, geospatial databases, and various environmental and socio-economic datasets.

Data Processing and Model Evaluation

 

19.jpg

 

All input data is resampled to a consistent resolution of 0.004 degrees latitude and longitude (approximately 410x410 meters at the equator). This standardization ensures compatibility across different data sources and enables consistent analysis.

Model evaluation is performed using historical data, calculating:

  • Precision: Measure of false positive rate

  • Recall: Measure of false negative rate

  • F0.5 score: Balanced measure of precision and recall, with more emphasis on precision

These metrics allow us to assess and refine the model's performance continually.

Machine Learning Algorithm Comparison

 

20.jpg

Through rigorous testing, we've found that XGBoost's ensemble trees algorithm consistently outperforms other methods such as Support Vector Machines (SVM) and Neural Networks for our specific use case. XGBoost offers superior predictive power and robustness, making it ideal for the complex task of deforestation prediction.

Data Preprocessing Techniques

 

21.jpg

Several preprocessing techniques are employed to ensure data quality and compatibility:

  1. Reprojection: Aligning all data to a common coordinate reference system

  2. Reclassification: Grouping or categorizing data values for simplified analysis

  3. Rasterization: Converting vector data to raster format for uniform analysis

  4. Resampling: Adjusting the resolution of raster data to our standard 0.004 degree grid

  5. Data Flattening: Transforming multi-dimensional data into a 2D format for machine learning input

These techniques are crucial for creating a consistent and high-quality dataset for our model.

Time Gap in Model Training

22.jpg

A six-month gap is maintained between the training data and the prediction target. This gap serves two crucial purposes:

  1. It prevents the model from "cheating" by using information that wouldn't be available in a real-world prediction scenario.

  2. It accounts for the lag in confirming deforestation events, ensuring that our training data is complete and accurate.

This approach ensures that our model learns to make genuine predictions rather than simply recapitulating known data.

Automatic Feature Selection

23.jpg

Our software incorporates an intelligent feature selection mechanism:

  • For each feature, the system automatically selects the most appropriate dataset based on the date in the filename.

  • It chooses the data closest to, but never after, the chosen date for training, validation, or testing.

This ensures that the model always uses the most relevant and temporally appropriate data for each prediction task, maintaining the integrity of the time-based prediction model.

By combining these advanced techniques and careful methodological considerations, Forest Foresight delivers highly accurate and reliable deforestation predictions, providing valuable insights for conservation efforts and land management strategies.

XGBoost

To further illustrate our machine learning approach, we present three key concepts through visual representations:

Decision Tree Principles

 

24.jpg

The first image demonstrates the fundamental concept of decision trees:

  • A decision tree is a flowchart-like structure where each internal node represents a "test" on an attribute (e.g., "Is the distance to the nearest road less than 1 km?").

  • Each branch represents the outcome of the test.

  • Each leaf node represents a class label (the decision taken after computing all attributes).

In the context of Forest Foresight, a decision tree might use factors like proximity to roads, forest density, and historical deforestation patterns to classify areas as high or low risk for future deforestation.

XGBoost Performance Comparison

25.jpg

The second image illustrates XGBoost's superior performance compared to single decision trees and random forests:

  • Single Decision Tree: Shows moderate performance but is prone to overfitting.

  • Random Forest: Improves upon single trees by aggregating multiple trees, reducing overfitting.

  • XGBoost: Demonstrates the highest performance, particularly in complex scenarios like deforestation prediction.

This visualization underscores why Forest Foresight utilizes XGBoost for its predictive modeling, as it consistently outperforms other tree-based methods in both accuracy and robustness.

XGBoost Basic Principles

26.jpg

The third image outlines the fundamental principles of XGBoost (eXtreme Gradient Boosting):

  1. Sequential Tree Building: XGBoost builds trees one at a time, where each new tree helps to correct errors made by previously trained trees.

  2. Gradient Boosting: It uses gradient descent to minimize errors, optimizing the loss function with each new tree.

  3. Regularization: XGBoost employs regularization techniques to prevent overfitting, balancing model complexity with predictive power.

  4. Feature Importance: The algorithm automatically calculates feature importance, helping identify the most crucial factors in deforestation prediction.

  5. Handling Missing Data: XGBoost has built-in methods for handling missing data, which is particularly useful in large-scale environmental modeling where data gaps are common.

These principles make XGBoost particularly well-suited for the complex task of deforestation prediction, allowing Forest Foresight to create highly accurate and robust models from diverse environmental and socio-economic data sources.

By leveraging the power of XGBoost, Forest Foresight can process large amounts of multidimensional data efficiently, capture complex non-linear relationships between variables, and produce reliable predictions of future deforestation risks.