Semester
Fall
Date of Graduation
2012
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Roy S. Nutter
Committee Co-Chair
Bojan Cukic
Committee Member
Tim Menzies
Abstract
Information retrieval is the process of recalling and ordering all relevant documents based on a user's search query. Examples of information retrieval systems are Google, Bing, and Yahoo search. In order to perform an effective search, these systems utilize an inverted index for mapping content, such as words, to the original document. It is widely believed there are two options for implementing an inverted index and these options are in memory or as a file. This investigation looks at implementing an inverted index as a table in a database as compared to the other two options. In addition, this investigation will look at the optimal combination of inverted index implementation to retrieval algorithms such as TD-IDF, Best Match 25, and a unigram model with Jelinek-Mercer smoothing. This is determined by designing and developing a system which will index and search three different collections of various data, size, and complexities. By doing this, it is found that utilizing an inverted index implemented in a database is a viable option for information retrieval. It is also noteworthy that Best Match 25 or a unigram language model consistently outperforms TD-IDF. In conclusion, if the collection cannot be indexed in memory, then utilizing a database implemented index is a sufficient second option.
Recommended Citation
Mantheiy, James E. Jr., "The Effects of Index Storage on Ranked Information Retrieval" (2012). Graduate Theses, Dissertations, and Problem Reports. 636.
https://researchrepository.wvu.edu/etd/636