Document Type


Publication Date



Eberly College of Arts and Sciences


Geology and Geography


High spatial resolution (1–5 m) remotely sensed datasets are increasingly being used to map land covers over large geographic areas using supervised machine learning algorithms. Although many studies have compared machine learning classification methods, sample selection methods for acquiring training and validation data for machine learning, and cross-validation techniques for tuning classifier parameters are rarely investigated, particularly on large, high spatial resolution datasets. This work, therefore, examines four sample selection methods—simple random, proportional stratified random, disproportional stratified random, and deliberative sampling—as well as three cross-validation tuning approaches—k-fold, leave-one-out, and Monte Carlo methods. In addition, the effect on the accuracy of localizing sample selections to a small geographic subset of the entire area, an approach that is sometimes used to reduce costs associated with training data collection, is investigated. These methods are investigated in the context of support vector machines (SVM) classification and geographic object-based image analysis (GEOBIA), using high spatial resolution National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters, covering a 2,609 km2 regional-scale area in northeastern West Virginia, USA. Stratified-statistical-based sampling methods were found to generate the highest classification accuracy. Using a small number of training samples collected from only a subset of the study area provided a similar level of overall accuracy to a sample of equivalent size collected in a dispersed manner across the entire regional-scale dataset. There were minimal differences in accuracy for the different cross-validation tuning methods. The processing time for Monte Carlo and leave-one-out cross-validation were high, especially with large training sets. For this reason, k-fold cross-validation appears to be a good choice. Classifications trained with samples collected deliberately (i.e., not randomly) were less accurate than classifiers trained from statistical-based samples. This may be due to the high positive spatial autocorrelation in the deliberative training set. Thus, if possible, samples for training should be selected randomly; deliberative samples should be avoided.

Source Citation

A. Ramezan, C., A. Warner, T., & E. Maxwell, A. (2019). Evaluation of Sampling and Cross-Validation Tuning Strategies for Regional-Scale Machine Learning Classification. Remote Sensing, 11(2), 185.


This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article received support from the WVU Libraries' Open Access Author Fund.

Included in

Geography Commons



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.