Semester
Summer
Date of Graduation
2021
Document Type
Thesis
Degree Type
MS
College
Statler College of Engineering and Mineral Resources
Department
Lane Department of Computer Science and Electrical Engineering
Committee Chair
Donald Adjeroh
Committee Co-Chair
Aldo Romero
Committee Member
Brian Powell
Abstract
Ever since the beginning of research journals, the number of academic publications has been increasing steadily. Nowadays, especially, with the new importance of online open-access journals and databases, research papers are more easily available to read and share. It also becomes harder to keep up with novelties and grasp an idea of the general impact of a given researcher, institution, journal, or field. For this reason, different bibliometric indicators are now routinely used to classify and evaluate the impact or significance of individual researchers, conferences, journals, or entire scientific communities. In this thesis, we provide tools to study trends in any given area of science. However, we focus our work on the field of Density Functional Theory (DFT), an important methodology in physics and chemistry, used to describe materials at the atomic scale, which has demonstrated an exponential number of related publications (with 5,339 in 2010, 9,931 in 2015 and 14,265 in 2019). We measure the specific impact of this theory by means of the citation record of the most used solid-state first principle {\it ab initio} computational packages. Along with this analysis, we developed a Python library, pyBiblio, to compute basic bibliometric analyses on any Web of Science database. To get a deeper understanding of the field, we also use the Latent Dirichlet Allocation (LDA) algorithm on the abstracts of the papers published in this field to classify documents into topics of interest. Indeed, LDA is a generative topic modeling algorithm that creates an efficient and reliable distribution of documents over topics constructed from the papers’ vocabulary. We find that DFT is a collaborative field, with tight international clusters, especially in Europe and between countries where packages are developed. We study the evolution of topics over the years and find evidence for the specialization of the software packages, even if they include similar capabilities.
Recommended Citation
Dumaz, Marie Coraline, "Topic Modeling and Cultural Nature of Citations" (2021). Graduate Theses, Dissertations, and Problem Reports. 8226.
https://researchrepository.wvu.edu/etd/8226
Included in
Condensed Matter Physics Commons, Data Science Commons, Numerical Analysis and Scientific Computing Commons