Semester
Spring
Date of Graduation
2006
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Katerina Goseva Popstojanova
Abstract
Web servers have a significant presence in today's Internet. Corporations want to achieve high availability, scalability, and consistent performance for respective Web systems, maintaining high customer service standards. Web Workload characterization and the analysis of Web log files are the basis on which Web server modeling for efficiency, scalability and availability can be planned. This thesis analyzes the Web access logs of six public Web sites: Department of Computer Science and Electrical Engineering at West Virginia University, West Virginia University, three NASA IVV servers, and Clarknet server. In addition, three private NASA IVV servers are also analyzed.;We characterize sessions using several attributes such as number of request per session, session length in time units, number of bytes transferred per session, and number of erroneous requests per session. We use clustering, as unsupervised learning methods, to classify Web server sessions. Unlike most other studies which were focused on building user profiles based on their navigational patterns, we use session attributes as basis for clustering. We also study the effectiveness of the Principal Component Analysis on session classification based on clustering.
Recommended Citation
Jha, Deepak, "Web workload analysis and session characterization using clustering" (2006). Graduate Theses, Dissertations, and Problem Reports. 4236.
https://researchrepository.wvu.edu/etd/4236