Open Source Skill level Test

R skill level test

If you would like to use the ForestForesight package you need some basic scripting knowledge in R. We have provided the following twenty questions to test your knowledge. If you can successfully answer about 60% of them you have the required skills to use ForestForesight the package. The answers are at the bottom

  1. What is the primary GUI (Graphical User Interface) used for R?
    a) RStudio
    b) SPSS
    c) Jupyter Notebook
    d) PyCharm

  2. How do you install a package in R?
    a) install.packages("package_name")
    b) library(package_name)
    c) require(package_name)
    d) get.package("package_name")

  3. Which function is used to load a package in R?
    a) load("package_name")
    b) use("package_name")
    c) library("package_name")
    d) import("package_name")

  4. How do you create a simple scatter plot in R?
    a) scatter(x, y)
    b) plot(x, y)
    c) ggplot(x, y)
    d) scatterplot(x, y)

  5. What command is used to update a package from GitHub?
    a) update.packages("username/repo")
    b) devtools::install_github("username/repo")
    c) github::update("username/repo")
    d) install.packages("github::username/repo")

  6. How do you get a summary of a variable 'x' in R?
    a) describe(x)
    b) summarize(x)
    c) summary(x)
    d) stats(x)

  7. In RStudio, how can you view the contents of a variable?
    a) Click on the variable name in the Environment pane
    b) Type the variable name in the console
    c) Use the view() function
    d) All of the above

  8. What function is used to read a CSV file in R?
    a) read.csv()
    b) import.csv()
    c) load.csv()
    d) open.csv()

  9. How do you write data to a CSV file in R?
    a) save.csv()
    b) export.csv()
    c) write.csv()
    d) output.csv()

  10. Which of the following is the correct way to assign a value to a variable in R?
    a) x <- 5
    b) x = 5
    c) x := 5
    d) Both a and b

  11. What is the data type for whole numbers in R?
    a) int
    b) integer
    c) whole
    d) num

  12. How do you create a vector of numbers from 1 to 10 in R?
    a) vector(1:10)
    b) c(1:10)
    c) seq(1,10)
    d) Both b and c

  13. Which function is used to get the current working directory in R?
    a) getwd()
    b) pwd()
    c) current.dir()
    d) dir.now()

  14. How do you create a data frame from vectors in R?
    a) as.data.frame()
    b) create.data.frame()
    c) data.frame()
    d) make.data.frame()

  15. What does the function head() do in R?
    a) Returns the first element of a vector
    b) Shows the first few rows of a data frame
    c) Displays the column names of a data frame
    d) Retrieves the top-level directory

  16. How do you remove missing values from a vector 'x' in R?
    a) remove.na(x)
    b) x[!is.na(x)]
    c) delete.na(x)
    d) clean(x)

  17. Which function is used to get help on a specific R function?
    a) help(function_name)
    b) ?function_name
    c) info(function_name)
    d) Both a and b

  18. How do you concatenate two strings 'a' and 'b' in R?
    a) a + b
    b) concat(a, b)
    c) paste(a, b)
    d) join(a, b)

  19. What does the function length() return when applied to a data frame?
    a) Number of rows
    b) Number of columns
    c) Total number of elements
    d) Length of the longest column

  20. How do you install multiple packages at once in R?
    a) install.packages(c("package1", "package2", "package3"))
    b) install_multiple(package1, package2, package3)
    c) library(package1, package2, package3)
    d) get.packages(package1, package2, package3)

Answers to R scripting questions

  1. a) RStudio

  2. a) install.packages("package_name")

  3. c) library("package_name")

  4. b) plot(x, y)

  5. b) devtools::install_github("username/repo")

  6. c) summary(x)

  7. d) All of the above

  8. a) read.csv()

  9. c) write.csv()

  10. d) Both a and b

  11. b) integer

  12. d) Both b and c

  13. a) getwd()

  14. c) data.frame()

  15. b) Shows the first few rows of a data frame

  16. b) x[!is.na(x)]

  17. d) Both a and b

  18. c) paste(a, b)

  19. b) Number of columns

  20. a) install.packages(c("package1", "package2", "package3"))

GIS and Spatial Data in R (Using terra package)

If you want to start using your own datasets you also have to know how to handle spatial data, especially in R because the rest of the software is written in R as well. We have provided the following 15 questions. Again, scoring about 60% should be sufficient to feel confident about integrating your own datasets.

  1. What is the primary difference between raster and vector data in GIS?
    a) Raster data is continuous, vector data is discrete
    b) Raster data uses pixels, vector data uses points, lines, and polygons
    c) Raster data is for imagery, vector data is for maps
    d) Raster data is 3D, vector data is 2D

  2. Which file format is commonly used for raster data in GIS?
    a) Shapefile
    b) GeoTIFF
    c) GeoJSON
    d) KML

  3. What is a shapefile primarily used for in GIS?
    a) Storing raster data
    b) Storing vector data
    c) Compressing spatial data
    d) Encrypting spatial data

  4. Which R package is recommended for working with raster data in this test?
    a) sf
    b) raster
    c) terra
    d) sp

  5. How do you typically open a GeoTIFF file using the terra package in R?
    a) terra::readGeoTIFF("file.tif")
    b) terra::rast("file.tif")
    c) terra::open("file.tif")
    d) terra::read_geotiff("file.tif")

  6. Which function in the terra package would you use to read a shapefile?
    a) terra::readShapefile("file.shp")
    b) terra::vect("file.shp")
    c) terra::readOGR("file.shp")
    d) terra::read_sf("file.shp")

  7. How can you view the attributes of a vector dataset loaded with terra?
    a) attributes(data)
    b) data@data
    c) values(data)
    d) data

  8. What does the terra::crs() function do when applied to a spatial object?
    a) Changes the coordinate reference system
    b) Retrieves the coordinate reference system
    c) Checks if the coordinate reference system is valid
    d) Compares two coordinate reference systems

  9. Which function would you use to plot a raster layer in terra?
    a) terra::plot()
    b) plot()
    c) terra::map()
    d) terra::visualize()

  10. How can you extract values from a raster at specific point locations using terra?
    a) terra::extract()
    b) terra::sample()
    c) terra::point_extract()
    d) terra::value_at()

  11. What is GeoJSON primarily used for in GIS?
    a) Storing raster data
    b) Encoding various geographical data structures
    c) Compressing large spatial datasets
    d) Creating map projections

  12. Which function in terra would you use to merge multiple raster layers?
    a) terra::merge()
    b) terra::mosaic()
    c) terra::combine()
    d) terra::stack()

  13. How can you resample a raster to a different resolution using terra?
    a) terra::resample()
    b) terra::aggregate()
    c) terra::resize()
    d) terra::rescale()

  14. What does the terra::crop() function do?
    a) Removes pixels with no data
    b) Cuts a raster to a specified extent
    c) Reduces the number of bands in a raster
    d) Converts a raster to a vector

  15. How can you calculate the area of polygons in a vector dataset using terra?
    a) terra::area()
    b) terra::calculate_area()
    c) terra::polygonArea()
    d) terra::expanse()

Answers to GIS Questions

  1. b) Raster data uses pixels, vector data uses points, lines, and polygons

  2. b) GeoTIFF

  3. b) Storing vector data

  4. c) terra

  5. b) terra::rast("file.tif")

  6. b) terra::vect("file.shp")

  7. c) values(data)

  8. b) Retrieves the coordinate reference system

  9. b) plot()

  10. a) terra::extract()

  11. b) Encoding various geographical data structures

  12. b) terra::mosaic()

  13. a) terra::resample()

  14. b) Cuts a raster to a specified extent

  15. d) terra::expanse()

General Machine Learning Questions

Forest Foresight uses XGBoost to predict deforestation. This is a machine learning algorithm. Some basic knowledge of machine learning is recommended to continue using the open source alternative. Test your knowledge below. Again, scoring about 60% should prove sufficient to continue with ForestForesight. When you want to start tweaking ff_train we recommend getting a more thorough understanding of XGBoost. Look here for more resources on this topic: External Resources

  1. What is the main difference between supervised and unsupervised learning?
    a) Supervised learning requires more data
    b) Unsupervised learning is always more accurate
    c) Supervised learning uses labeled data, unsupervised learning uses unlabeled data
    d) Supervised learning is for classification, unsupervised is for regression

  2. Which of the following is an example of a classification problem?
    a) Predicting house prices
    b) Estimating a person's age
    c) Identifying spam emails
    d) Forecasting stock prices

  3. What is overfitting in machine learning?
    a) When a model performs well on training data but poorly on new data
    b) When a model is too simple to capture the underlying patterns
    c) When a model requires too much computational power
    d) When a model has too few parameters

  4. Which technique is commonly used to prevent overfitting?
    a) Increasing model complexity
    b) Cross-validation
    c) Using all available features
    d) Training on the entire dataset

  5. What is the purpose of the train-test split in machine learning?
    a) To speed up the training process
    b) To evaluate model performance on unseen data
    c) To increase model accuracy
    d) To reduce computational requirements

  6. Which of the following is NOT a common evaluation metric for classification problems?
    a) Accuracy
    b) Precision
    c) Recall
    d) Mean Squared Error

  7. What is the main goal of feature scaling in machine learning?
    a) To increase the number of features
    b) To remove outliers from the dataset
    c) To bring all features to a similar range
    d) To reduce the dimensionality of the dataset

  8. Which algorithm is based on the principle of maximizing the margin between classes?
    a) K-Nearest Neighbors
    b) Naive Bayes
    c) Support Vector Machine
    d) Decision Tree

  9. What is the primary purpose of Principal Component Analysis (PCA)?
    a) Feature selection
    b) Dimensionality reduction
    c) Model evaluation
    d) Data augmentation

  10. In the context of neural networks, what does an activation function do?
    a) Determines the learning rate
    b) Introduces non-linearity to the model
    c) Defines the number of neurons in each layer
    d) Calculates the loss function

Answers to Machine Learning Questions

  1. c) Supervised learning uses labeled data, unsupervised learning uses unlabeled data

  2. c) Identifying spam emails

  3. a) When a model performs well on training data but poorly on new data

  4. b) Cross-validation

  5. b) To evaluate model performance on unseen data

  6. d) Mean Squared Error

  7. c) To bring all features to a similar range

  8. c) Support Vector Machine

  9. b) Dimensionality reduction

  10. b) Introduces non-linearity to the model