Semester

Fall

Date of Graduation

2012

Document Type

Thesis

Degree Type

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Roy S. Nutter

Committee Co-Chair

Bojan Cukic

Committee Member

Tim Menzies

Abstract

Information retrieval is the process of recalling and ordering all relevant documents based on a user's search query. Examples of information retrieval systems are Google, Bing, and Yahoo search. In order to perform an effective search, these systems utilize an inverted index for mapping content, such as words, to the original document. It is widely believed there are two options for implementing an inverted index and these options are in memory or as a file. This investigation looks at implementing an inverted index as a table in a database as compared to the other two options. In addition, this investigation will look at the optimal combination of inverted index implementation to retrieval algorithms such as TD-IDF, Best Match 25, and a unigram model with Jelinek-Mercer smoothing. This is determined by designing and developing a system which will index and search three different collections of various data, size, and complexities. By doing this, it is found that utilizing an inverted index implemented in a database is a viable option for information retrieval. It is also noteworthy that Best Match 25 or a unigram language model consistently outperforms TD-IDF. In conclusion, if the collection cannot be indexed in memory, then utilizing a database implemented index is a sufficient second option.

Recommended Citation

Mantheiy, James E. Jr., "The Effects of Index Storage on Ranked Information Retrieval" (2012). Graduate Theses, Dissertations, and Problem Reports. 636.
https://researchrepository.wvu.edu/etd/636

Download

COinS

DOI

https://doi.org/10.33915/etd.636

Graduate Theses, Dissertations, and Problem Reports

The Effects of Index Storage on Ranked Information Retrieval

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Abstract

Recommended Citation

DOI

Browse

Resources

Search

Author Corner

Graduate Theses, Dissertations, and Problem Reports

The Effects of Index Storage on Ranked Information Retrieval

Author

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Committee Member

Abstract

Recommended Citation

Share

DOI

Browse

Resources

Search

Author Corner