Date of Graduation
Statler College of Engineering and Mineral Resources
Lane Department of Computer Science and Electrical Engineering
We examine two proposed indexing algorithms taking advantage of the new SSMinT libraries. The two algorithms primarily differ in their selection of documents for learning. The batch indexing method selects some random number of documents for learning. The iterative indexing method uses a single randomly selected document to discover semantic signatures, which are then used to find additional related documents. The batch indexing method discovers one to three semantic signatures per document, resulting in poor clustering performance as evaluated by human cross-validation of clusters using the Adjusted Rand Index. The iterative indexing method discovers more semantic signatures per document, resulting in far better clustering performance using the same cross-validation method.;Our new tools enable faster development of new experiments, forensic applications, and more. The experiments show that SSMinT can provide effective indexing for text data such as e-mail or web pages. We conclude with areas of future research which may benefit from utilizing SSMinT. (Abstract shortened by ProQuest.).
Cecil, Kelly, "Reimagining the SSMinT Software Package" (2017). Graduate Theses, Dissertations, and Problem Reports. 5329.