Date of Graduation
2017
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Elaine Eschen
Committee Co-Chair
Alan Barnes
Committee Member
Roy Nutter
Abstract
We examine two proposed indexing algorithms taking advantage of the new SSMinT libraries. The two algorithms primarily differ in their selection of documents for learning. The batch indexing method selects some random number of documents for learning. The iterative indexing method uses a single randomly selected document to discover semantic signatures, which are then used to find additional related documents. The batch indexing method discovers one to three semantic signatures per document, resulting in poor clustering performance as evaluated by human cross-validation of clusters using the Adjusted Rand Index. The iterative indexing method discovers more semantic signatures per document, resulting in far better clustering performance using the same cross-validation method.;Our new tools enable faster development of new experiments, forensic applications, and more. The experiments show that SSMinT can provide effective indexing for text data such as e-mail or web pages. We conclude with areas of future research which may benefit from utilizing SSMinT. (Abstract shortened by ProQuest.).
Recommended Citation
Cecil, Kelly, "Reimagining the SSMinT Software Package" (2017). Graduate Theses, Dissertations, and Problem Reports. 5329.
https://researchrepository.wvu.edu/etd/5329