Faculty & Staff Scholarship

Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification

Christopher A. Ramezan, West Virginia University
Timothy A. Warner, West Virginia University
Aaron E. Maxwell, West Virginia University

Document Type

Article

Publication Date

2019

College/Unit

Eberly College of Arts and Sciences

Department/Program/Center

Geology and Geography

Abstract

High spatial resolution (1–5 m) remotely sensed datasets are increasingly being used to map land covers over large geographic areas using supervised machine learning algorithms. Although many studies have compared machine learning classification methods, sample selection methods for acquiring training and validation data for machine learning, and cross-validation techniques for tuning classifier parameters are rarely investigated, particularly on large, high spatial resolution datasets. This work, therefore, examines four sample selection methods—simple random, proportional stratified random, disproportional stratified random, and deliberative sampling—as well as three cross-validation tuning approaches—k-fold, leave-one-out, and Monte Carlo methods. In addition, the effect on the accuracy of localizing sample selections to a small geographic subset of the entire area, an approach that is sometimes used to reduce costs associated with training data collection, is investigated. These methods are investigated in the context of support vector machines (SVM) classification and geographic object-based image analysis (GEOBIA), using high spatial resolution National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters, covering a 2,609 km2 regional-scale area in northeastern West Virginia, USA. Stratified-statistical-based sampling methods were found to generate the highest classification accuracy. Using a small number of training samples collected from only a subset of the study area provided a similar level of overall accuracy to a sample of equivalent size collected in a dispersed manner across the entire regional-scale dataset. There were minimal differences in accuracy for the different cross-validation tuning methods. The processing time for Monte Carlo and leave-one-out cross-validation were high, especially with large training sets. For this reason, k-fold cross-validation appears to be a good choice. Classifications trained with samples collected deliberately (i.e., not randomly) were less accurate than classifiers trained from statistical-based samples. This may be due to the high positive spatial autocorrelation in the deliberative training set. Thus, if possible, samples for training should be selected randomly; deliberative samples should be avoided.

Digital Commons Citation

Ramezan, Christopher A.; Warner, Timothy A.; and Maxwell, Aaron E., "Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification" (2019). Faculty & Staff Scholarship. 1314.
https://researchrepository.wvu.edu/faculty_publications/1314

Source Citation

A. Ramezan, C., A. Warner, T., & E. Maxwell, A. (2019). Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification. Remote Sensing, 11(2), 185. https://doi.org/10.3390/rs11020185

Comments

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article received support from the WVU Libraries' Open Access Author Fund.

Download

Included in

Geography Commons

COinS

Faculty & Staff Scholarship

Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification

Document Type

Publication Date

College/Unit

Department/Program/Center

Abstract

Digital Commons Citation

Source Citation

Comments

Included in

Browse

Resources

Search

Author Corner

Faculty & Staff Scholarship

Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification

Authors

Document Type

Publication Date

College/Unit

Department/Program/Center

Abstract

Digital Commons Citation

Source Citation

Comments

Included in

Share

Browse

Resources

Search

Author Corner