Semester
Fall
Date of Graduation
2022
Document Type
Dissertation
Degree Type
PhD
College
Statler College of Engineering and Mineral Resources
Department
Mechanical and Aerospace Engineering
Committee Chair
Terence Musho
Committee Co-Chair
Ali Baheri
Committee Member
Ali Baheri
Committee Member
Debangsu Bhattachayya
Committee Member
Edward Sabolsky
Committee Member
Christina Wildfire
Abstract
With the advent of machine learning (ML) in the field of Materials Science, it has become obvious that trained models are limited by the amount and quality of the data used for training. Where researchers do not have access to the breadth and depth of labeled data that fields like image processing and natural language processing enjoy. In the specific application of materials discovery, there is the issue of continuity in atomistic datasets. Often if one relies on experimental data mined from literature and patents this data is only available for the most favorable of atomistic data. This ultimately leads to bias in the training dataset. In providing a solution, this research focuses on investigating the deployment of ML models trained on synthetic data and the development of a language-based approach for synthetically generating training datasets. It has been applied to three material science-related problems to prove these approaches work. The first problem was the prediction of dielectric properties, the second problem was the synthetic generation of chemical reaction datasets, and the third problem was the synthetic generation of quantum material datasets. All three applications proved successful and demonstrated the ability to generate continuous datasets that resolve the issue of dataset bias.
This first study investigated the synthetic generation of complex dielectric properties of granular powders and their ability to train a ML network. The neural network was trained using a supervised learning approach and a common backpropagation. The network was double-validated using experimental data collected from a coaxial airline experiment.
The second study demonstrated the synthetic generation of a chemical reaction database. An artificial intelligence model based on a Variational Autoencoder (VAE) has been developed and investigated to synthetically generate continuous datasets. The approach involves sampling the latent space to generate new chemical reactions that were assembled into the synthetic dataset. This developed technique is demonstrated by generating over 7,000,000 new reactions from a training dataset containing only 7,000 reactions. The generated reactions include molecular species that are larger and more diverse than the training set.
The third study investigated a similar variational autoencoder approach to the second study but with the application of generating a synthetic dataset for quantum materials focusing on quantum sensing applications. The specific quantum sensors of interest are two-level quantum molecules that exhibit dipole blockade. This study offers an improved sampling algorithm by continuously feeding newly generated materials into a sampling algorithm to help generate a more normally distributed dataset. This technique was able to generate over 1,000,000 new quantum materials from a small dataset of only 8,000 materials. From the generated dataset it was identified that several iodine-containing molecules are candidate quantum sensor materials for future studies.
Recommended Citation
Tempke, Robert, "Artificial Intelligence based Approach for Rapid Material Discovery: From Chemical Synthesis to Quantum Materials" (2022). Graduate Theses, Dissertations, and Problem Reports. 11518.
https://researchrepository.wvu.edu/etd/11518