Author ORCID Identifier

https://orcid.org/0009-0000-8253-832X

Semester

Spring

Date of Graduation

2026

Document Type

Thesis (Campus Access)

Degree Type

MS

College

Eberly College of Arts and Sciences

Department

Forensic and Investigative Science

Committee Chair

Arati Iyengar

Committee Member

Luis Arroyo

Committee Member

Ember Morrisey

Abstract

Soil can be easily transferred from a crime scene to a suspect’s clothing, shoes, or tires, making it an important form of trace evidence. OSAC’s Trace Materials Subcommittee has called for research and development into establishing a standardized method for analyzing soil as trace evidence. Using DNA metabarcoding with 16S rRNA for bacteria and archaea, 18S rRNA for eukaryotic microbes, ITS2 for fungi, and rbcL for diatoms, we conducted a study into predicting the provenance of surface soils in Monongalia County, West Virginia. Soil from different types of habitats and USDA soil types was sampled from ten sites. At each site, five sets of samples were collected: a central point and four additional points, one in each cardinal direction at distances of 5, 10, 20 and 50 feet from the central sample. A replicate sample was collected 1-7 days after initial collection from one randomly selected collection point at every site. Additionally, soil was collected on shoes as mock evidence items from a point just outside the 50-foot core sampling radius.

DNA was extracted using the DNA QIAGEN DNEasy PowerSoil Pro Kit, and quantitation was performed using a Quantus™ Fluorometer. Triplicate amplicon PCRs for all four targets using Illumina adapters were set up in 25 µL volumes with 25 ng input DNA using 2x KAPA HiFi HotStart ReadyMix before pooling targets for sample purification with AmpliClean purification beads. Index PCRs were performed in 25 µL volumes using 2.5 µL each of a pair of unique Nextera XT Index primers for each sample and the 2x KAPA HiFi polymerase master mix, with 2.5 µL of purified product. Samples were sequenced using an Illumina MiSeq FGx using the Illumina v3 reagent kit cartridge. FASTQ files were analyzed using Illumina’s DRAGEN secondary analysis software using the 16S Plus app. The internal Refseq-RDP database was used for 16S rRNA species identification, and custom databases were used for the other targets (PR2 for 18S rRNA, Diat.barcode for rbcL, and UNITE for ITS2). A random forest model was trained using taxonomic abundance data filtered to include the top 35% of species with four randomly selected core samples from the five at each site (n=40), and then tested using the withheld core sample (n=10), along with the replicate (n=10) and shoe (n=10) sample from that site. Using 16S rRNA, 18S rRNA, and ITS2 targets combined, accuracies of 90% with predicting provenance of core samples, 100% with predicting provenance of replicate samples, and 80% with predicting provenance of shoe samples were achieved. rbcL target data was not used for ML, as inclusion reduced prediction accuracy.

Our findings indicate that replicate sampling in-situ 1-7 days after initial collection results in accurate classification to the site of origin every time, and that species’ taxonomic abundance changes over time during ex-situ storage of mock evidentiary items, resulting in < 100% classification accuracy to the site of origin. Finally, our results showed no correlation between USDA soil type or geographical distance between sites and resulting Bray-Curtis dissimilarity estimates. This study, the first in West Virginia, has taken preliminary steps towards the eventual goal of establishing DNA metabarcoding as the standard method for forensic soil analysis.

Available for download on Friday, April 30, 2027

Share

COinS