/
Preprocessing your own datasets

Preprocessing your own datasets

The main reason to want to use the ForestForesight package is to create your own datasets and use them to improve the model. Below we give instructions on how to do this.

Main Functions

The main functions that are required are:

  • Reprojection: Reprojection is the process of converting spatial data from one coordinate system to another. It allows you to align data from different sources or to display data in a desired projection for analysis or visualization.

  • Resampling: Resampling involves changing the cell size or resolution of a raster dataset. It can be used to increase or decrease the spatial resolution of data, often to match the resolution of other datasets or to reduce file size.

  • Reclassification: Reclassification is the process of reassigning values in a raster dataset based on specified criteria. It's commonly used to simplify complex data, create categorical maps from continuous data, or to recode values for analysis.

  • Filtering: Filtering in spatial analysis refers to selecting or highlighting specific data based on certain criteria. It can involve removing unwanted data or emphasizing particular features, often used to focus on areas of interest or to remove noise from datasets.

  • Clipping: Clipping is the process of extracting a portion of a spatial dataset that falls within a defined boundary. It's used to focus on a specific area of interest or to reduce the size of a dataset to a manageable extent.

  • Distance Calculation: Distance calculation in GIS involves computing the spatial distance between features or locations. It can be used to create buffer zones, analyze proximity relationships, or generate distance-based raster surfaces for various spatial analyses

    21.jpg

     

Environment setup 

library(ForestForesight)  # Make sure you have run ff_environment previously, if not you have to assign the ff_folder yourself ff_folder <- Sys.getenv("DATA_FOLDER")  template_folder <- list.files(file.path(ff_folder, "preprocessed", "input"), pattern = "^[0-9]{2}[NS]_[0-9]{3}[EW]$", full.names = TRUE)[1]  template_raster <- rast(list.files(template_folder, pattern = "\\.tif$", full.names = TRUE)[1]) 

For more information about the config file please refer to the configuration page here Open-Source Contribution .  

Reprojecting a raster dataset

This is useful when your input raster is in a different coordinate reference system (CRS) than your template or has a different resolution or extent.

Below we give different example scripts on how to preprocess your own dataset. Please make sure you enter your own datasets when you load your dataset using rast or vect

Example: Reprojecting a land cover raster

# Load a land cover raster in a different CRS land_cover <- rast("path/to/your/own/dataset.tif") #this is an example! Enter your own dataset # Check CRS print(crs(land_cover)) print(crs(template_raster)) # Reproject using nearest neighbor (best for categorical data) land_cover_nearest <- project(land_cover, template_raster, method = "near") # Reproject using cubic (often better for continuous data) land_cover_cubic <- project(land_cover, template_raster, method = "cubic") # Compare results par(mfrow = c(1, 2)) plot(land_cover_nearest, main = "Nearest Neighbor") plot(land_cover_cubic, main = "Cubic")

Selecting a single layer from a multi-layer raster

This is common when working with satellite imagery or time series data.

Example: Selecting a single band from a Landsat image

# Load a multi-band Landsat image landsat <- rast("landsat_image.tif") #this is an example! Enter your own dataset # Check number of layers nlyr(landsat) # Select the Near-Infrared band (usually band 5 in Landsat 8) nir_band <- landsat[[5]] # Standardize to template nir_standardized <- project(nir_band, template_raster, method = "cubic") # Plot plot(nir_standardized, main = "Near-Infrared Band")

Reclassifying categorical data

This is useful for simplifying land cover classes or creating binary masks.

Example: Reclassifying a land cover raster

Rasterizing a vector dataset (presence/absence)

This is useful for creating binary masks from vector data.

Example: Creating a forest/non-forest mask from polygon data

Rasterizing a vector dataset with a specific attribute

This is useful when you want to preserve numerical or categorical information from the vector data.

Example: Rasterizing administrative boundaries with population data

Creating a distance raster from vector data

This is useful for proximity analysis, such as distance to roads or water bodies.

Example: Calculating distance to roads

Writing the new dataset to disk

The standard format is: {TILE_ID}_{DATE}_{FEATURE}.tif

Where:

  • {TILE_ID} is the geographic identifier (e.g., "00N_010E")

  • {DATE} is the availability or creation date in YYYY-MM-01 format. The day numbers should always be 01

  • {FEATURE} is a descriptive name of the raster's content. This should not contain underscores

Example: "00N_010E_2023-06-01_vegetationdensity.tif"

When creating new rasters, users should:

  1. Use the same tile identifier as the template raster.

  2. Choose an appropriate date that represents when the data becomes available or relevant.

  3. Select a clear, concise name for the feature they've created.

  4. Ensure the file is saved as a GeoTIFF (.tif extension).

The raster should be stored in the input/preprocessed/{TILE_ID} folder

Related content

Installing the Forest Foresight package
Installing the Forest Foresight package
More like this
ff_run: using the Forest Foresight package for training and predicting
ff_run: using the Forest Foresight package for training and predicting
More like this
Downloading Forest Foresight data
Downloading Forest Foresight data
More like this
ff_analyze: Analyzing the results
ff_analyze: Analyzing the results
Read with this
Environment configuration (ff_environment)
Environment configuration (ff_environment)
More like this
Training Material
Training Material
More like this