Date of Graduation


Document Type


Degree Type



Eberly College of Arts and Sciences


Physics and Astronomy

Committee Chair

James P. Lewis

Committee Co-Chair

Cheng Cen

Committee Member

Cheng Cen

Committee Member

Edward Flagg

Committee Member

Tudor Stanescu

Committee Member

Xiao-Dong Wen


In the last ten years, machine learning potentials have been successfully applied to the study of crystals, and molecules. However, more complex materials like clusters, macro-molecules, and glasses are out reach of current methods. The input of any machine learning system is a tensor of features (the most universal type are rank 1 tensors or vectors of features), the quality of any machine learning system is directly related to how well the feature space describes the original physical system. So far, the feature engineering process for machine learning potentials can not describe complex material. The current methods are highly inefficient transforming the information of the physical structure into the feature vector, the losses of information constraint the accuracy of machine learning potentials. This work introduces the Structural Information Filtered Features (SIFF), the SIFF is a feature engineering method, based on maximizing the transfer of information from the physical structure to the feature space. The SIFF are thought as a universal feature, universal in two senses. First is able to describe complex systems, as well as molecules, and crystals. Second it can be easily used as input for any machine learning algorithm. When applied to crystals the SIFF does as well as the best feature engineering methods for this materials (SOAP, CGNN). When applied to molecules the SIFF performs better than the Bag of Bonds method, especially when the number of structures is reduced to less than 10000, in this conditions the SIFF shows a superior performance, due to its superior information transference. Whit respect to complex system, the SIFF is compared to the Behler and Parrinello approach, here the SIFF method reach an error of 0.083 eV/structure in 18110 second, in contrast the Behler and Parrinello method achieved and error of 0.109 eV/structure in 61969 seconds. The main disadvantage of the SIFF method is that the conventionality of the feature space grows exponentially with the number of chemical species in the system.