Predictive Synthesis

Advances in computational materials design allow researchers to rapidly screen both real and hypothetical materials for desirable properties. This phenomenon has led to a bottleneck shift in materials design from the discovery of novel materials to their synthesis. To quickly commercialize advanced materials, we need to move beyond trial-and-error synthesis techniques. Knowing the synthesis routes for new materials also allows estimates of the environmental impact of novel technologies before brining them to market [1].

Screenshot 2016-02-15 11.35.28The goal of this project is to advance materials synthesis techniques using data and computational models, i.e., to do for materials synthesis what modern computational methods have done for materials structures and properties. Since first principles methods for synthesis are limited, we derive synthesis models using data and machine learning. Most synthesis information is found in the scientific literature, which requires natural language processing techniques to extract and format. We have developed and maintain a computational pipeline that automatically downloads millions of chemistry and materials science journal articles, determine which paragraphs contain relevant synthesis information, and extracts synthesis data such as the precursors, types of operations, target materials, and synthesis conditions [2]. This data is then used in machine learning and data mining to inform on the synthesis of novel materials and improve existing synthesis methods. [3][4]. More information, along with synthesis datasets and numerous synthesis data extraction tools, can be found at Described below are several materials systems studied by the group with this methodology.

[1] Kim, Edward, et al., “Materials synthesis insights from scientific literature via text extraction and machine learning”, Chemistry of Materials, 29.21 (2017), 9436-9444.

[2] Kim, Edward, et al, “Machine-learned and codified synthesis parameters of oxide materials”, Scientific Data, 4 (2017), 170127.

[3] Kim, Edward, et al., “Virtual screening of inorganic materials synthesis parameters with deep learning”, npj Computational Materials, 3.1 (2017), 53.

[4] Kim, Edward, et al., “Inorganic materials synthesis planning with literature-trained neural networks”, arXiv preprint arXiv: 1901.00032 (2018).


Zeolites are crystalline, anat-porous, aluminosilicates with numerous applications in catalysis, carbon-capture, NOx abatement, and water decontamination. Due to their meta-stability and complex kinetics, zeolite synthesis is difficult to control and reproduce, as very small changes in conditions can lead to large changes in a zeolite’s structure. This work focuses on extracting synthesis conditions from the zeolite literature and modeling zeolite structure as a function of the synthesis using machine learning.

Using random forest regression, we demonstrated the ability to model the synthesis structure connection for germanium-containing zeolites. The tree model also provides human interpretable pathways that provide guidance for synthesizing low density zeolites [5][6].


[5] Jensen, Zach, et al., “A machine learning approach to zeolite synthesis enabled by automatic literature data extraction”, ACS central science (2019).

[6] Schwalke-Koda, Danial, et al., “Graph similarity drives zeolite diffusionless transformations and inter growth”, Nature materials 18.11 (2019), 1177-1181.


Aluminum alloys are very important in many industries, with applications ranging from light weighting and increasing fuel efficiency in automobiles, to being the most sustainable beverage container material on nearly all measures [2]. Adding alloying elements greatly affects the physical and chemical properties of aluminum, mostly via its microstructure, but knowing these effects a priori can be challenging due to the complexity of multicomponent phase diagrams that include all the alloying elements, and the non-equilibrium nature of alloy processing. This work looks to extract compositional and processing information from literature and link it back to the structural and mechanical properties of aluminum alloys. It also seeks to integrate phase diagrams and physics-based models with machine learning to obtain better, more physically meaningful and interpretable models.


Alternative Cementitious Materials

Due to the materials choices at the center of conventional cement production, CO2 emissions are inherent to the process. Given the scale of cement use, such emissions are estimated to be in the range 5-8% of the global annual GHG footprint. Therefore, to combat associated environmental impacts, development of alternative cementitious materials is essential. Potential alternative resources include industrial wastes such as coal fly ash, metallurgical slags, and biomass ash, which have the added benefit of avoiding typically associated landfill burden. Use of such materials avoids the calcination of limestone necessary for cement production, thereby greatly reducing emissions. When reacted in alkaline environments, these wastes create binders with properties similar to cement. However, chemistry and physical properties vary significantly between different waste streams — and even batch to batch — making it difficult to efficiently predict optimal material blends. This work extracts chemical, structural, and physical properties of wastes from the literature in order to relate such properties with material reactivity, thereby enabling the production of next-generation construction materials.