Introduction

Dominant tree species information is fundamental to forest management — species differ in ecological roles, economic value, and cultural importance. Traditional field surveys supply these data but are costly and spatially limited. Airborne Laser Scanning (ALS) now enables modelling of forest attributes, including dominant species, across large areas with high structural detail.

This project develops and evaluates a Python-based workflow to predict dominant tree species at the stand level in the Petawawa Research Forest (PRF), Ontario, using ALS-derived structural metrics and a Random Forest classifier. A central research question is whether ALS data alone is sufficient for species classification, or whether fusing it with Landsat 8 multispectral imagery substantially improves accuracy.

Petawawa Research Forest study area map
Figure 1. Study area — Petawawa Research Forest (PRF), Ontario.

Methods

Packages

arcpy rasterio pandas numpy matplotlib sklearn glob os

Feature Extraction

Topographic metrics (slope and aspect) were derived from the PRF 2012 DTM using ArcPy's Spatial Analyst extension. Seven stand-level metrics from the 2018 EFI surfaces — basal area, total aboveground biomass (summed across four size classes), dominant/codominant height, quadratic mean DBH, Lorey's height, stand density, and gross total volume — were loaded as raster layers. Landsat 8 Bands 2–7 (Blue through SWIR2) were included as multispectral predictors; the coastal aerosol band was excluded.

import arcpy, os, glob, rasterio
import pandas as pd
import numpy as np
from arcpy.sa import *
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap, BoundaryNorm

# ── Topographic metrics from 2012 DTM ───────────────
aspect_out = arcpy.sa.Aspect(DTM)
slope_out = arcpy.sa.Slope(DTM, output_measurement='DEGREE')

# ── Stand metrics from 2018 EFI surfaces ─────────────
biomass = Raster(BIO_Poles) + Raster(BIO_Large) + \
          Raster(BIO_Small) + Raster(BIO_Medium)
# basal_area, height_dominant, DBH,
# height (Lorey's), stand_density, volume

# ── Landsat 8 bands 2–7 (drop coastal aerosol) ───────
in_rasters = sorted(glob.glob(os.path.join(landsatpath, "*_SR_B*.tif")))
del in_rasters[0] # remove Band 1 (coastal aerosol)

Model Training & Prediction

All 15 feature layers (7 stand metrics + 6 spectral bands + slope + aspect) were stacked into a composite raster. Ground plot observations with fewer than 3 samples were grouped into an "Other" class to avoid model instability. A RandomForestClassifier was trained on a 70/30 train-validation split and applied pixel-by-pixel to the full landscape composite to produce a wall-to-wall species map.

Results

Using ALS structural metrics alone produced low overall accuracy. Adding Landsat 8 multispectral bands substantially improved per-class precision and recall, confirming that data fusion is necessary for reliable species classification.

Accuracy — ALS only
Table 1. Classification report — ALS only.
Accuracy — ALS + Landsat 8
Table 2. Classification report — ALS + Landsat 8.
17
Species classes modelled
15
Input features
70/30
Train / validation split
Python dominant species map
Figure 2. Dominant species map across the PRF — Python output.
ArcGIS Pro dominant species map
Figure 3. Dominant species map — ArcGIS Pro.

References