Semester

Fall

Date of Graduation

2022

Document Type

Dissertation

Degree Type

PhD

College

Statler College of Engineering and Mineral Resources

Department

Mechanical and Aerospace Engineering

Committee Chair

Terence Musho

Committee Co-Chair

Ali Baheri

Committee Member

Ali Baheri

Committee Member

Debangsu Bhattachayya

Committee Member

Edward Sabolsky

Committee Member

Christina Wildfire

Abstract

With the advent of machine learning (ML) in the field of Materials Science, it has become obvious that trained models are limited by the amount and quality of the data used for training. Where researchers do not have access to the breadth and depth of labeled data that fields like image processing and natural language processing enjoy. In the specific application of materials discovery, there is the issue of continuity in atomistic datasets. Often if one relies on experimental data mined from literature and patents this data is only available for the most favorable of atomistic data. This ultimately leads to bias in the training dataset. In providing a solution, this research focuses on investigating the deployment of ML models trained on synthetic data and the development of a language-based approach for synthetically generating training datasets. It has been applied to three material science-related problems to prove these approaches work. The first problem was the prediction of dielectric properties, the second problem was the synthetic generation of chemical reaction datasets, and the third problem was the synthetic generation of quantum material datasets. All three applications proved successful and demonstrated the ability to generate continuous datasets that resolve the issue of dataset bias.

This first study investigated the synthetic generation of complex dielectric properties of granular powders and their ability to train a ML network. The neural network was trained using a supervised learning approach and a common backpropagation. The network was double-validated using experimental data collected from a coaxial airline experiment.

The second study demonstrated the synthetic generation of a chemical reaction database. An artificial intelligence model based on a Variational Autoencoder (VAE) has been developed and investigated to synthetically generate continuous datasets. The approach involves sampling the latent space to generate new chemical reactions that were assembled into the synthetic dataset. This developed technique is demonstrated by generating over 7,000,000 new reactions from a training dataset containing only 7,000 reactions. The generated reactions include molecular species that are larger and more diverse than the training set.

The third study investigated a similar variational autoencoder approach to the second study but with the application of generating a synthetic dataset for quantum materials focusing on quantum sensing applications. The specific quantum sensors of interest are two-level quantum molecules that exhibit dipole blockade. This study offers an improved sampling algorithm by continuously feeding newly generated materials into a sampling algorithm to help generate a more normally distributed dataset. This technique was able to generate over 1,000,000 new quantum materials from a small dataset of only 8,000 materials. From the generated dataset it was identified that several iodine-containing molecules are candidate quantum sensor materials for future studies.

Share

COinS