Semester

Summer

Date of Graduation

2009

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Lane Department of Computer Science and Electrical Engineering

Committee Chair

Donald A Adjeroh

Committee Co-Chair

E James Harner

Abstract

High dimensional data is widely available in bioinformatics, chemometrics and other applications. For example, in gene expression experiments, tens of thousands of genes are probed. Phenotype data may be clinical data such as tumor types, or quantities measuring biological characteristics of a subject. While such high dimensional data can be readily generated, successful analysis and modeling of these data is highly challenging.;Random KNN, as proposed in this dissertation, is a novel generalization of traditional nearest-neighbor modeling. Random KNN consists of an ensemble of base k nearest-neighbor models, each taking a random subset of the input variables. A theoretical and empirical analysis of the performance of the Random KNN is performed. Based on the proposed Random KNN, a new feature selection method is devised. To rank the importance of the variables, a criterion, named support, is defined and computed on the Random KNN framework. A two-stage backward model selection method is developed using supports. The present study shows that the Random KNN is a more effective and more efficient model for high-dimensional data than existing approaches.;The Random KNN approach can be applied to both qualitative and quantitative responses, i.e., classification and regression problems, and has applications in statistics, machine learning, pattern recognition and bioinformatics, etc.;Keywords. classification, regression, feature selection, bioinformatics, gene expression analysis.

Recommended Citation

Li, Shengqiao, "Random KNN modeling and variable selection for high dimensional data" (2009). Graduate Theses, Dissertations, and Problem Reports. 4492.
https://researchrepository.wvu.edu/etd/4492

Download

COinS

DOI

https://doi.org/10.33915/etd.4492

Graduate Theses, Dissertations, and Problem Reports

Random KNN modeling and variable selection for high dimensional data

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Abstract

Recommended Citation

DOI

Browse

Resources

Search

Author Corner

Graduate Theses, Dissertations, and Problem Reports

Random KNN modeling and variable selection for high dimensional data

Author

Semester

Date of Graduation

Document Type

Degree Type

College

Department

Committee Chair

Committee Co-Chair

Abstract

Recommended Citation

Share

DOI

Browse

Resources

Search

Author Corner