Machine learning comprises a set of techniques by which structure and correlations are extracted from data in an automatic fashion. The derived models are then used to make predictions about new situations not seen in the data. In the past decade, machine learning has turned from a niche discipline to a technology that is influencing more and more parts of our daily lives and the way we do science.
In computational chemistry, a particularly successful application was the development of machine learning potentials, which vastly extend the time and length scales accessible to molecular simulations while mostly preserving the accuracy of the underlying more expensive methods (like DFT or CC) that were used to generate the data.
An important aspect of building successful machine learning models is the use of physical priors. These are important because they may allow the training of models with smaller amounts of data and better generalization of trends to realms beyond the distribution seen at training time. One such physical prior is equivariance: We know that molecular properties like scalars (e.g. total energy) or vectors (e.g. dipole moment) behave in a well-defined way under rotations and translations of the molecular geometry in space. So-called equivariant neural networks are constructed in such a way that they automatically satisfy this constraint. Another form of physical prior lies in the use of a computationally cheap (e.g. semi-empirical) baseline quantum-chemical method, and using machine learning only to find the corrections on top to approximate the results of higher-accuracy calculations. This is called Δ-machine learning. One of our research directions is the use of equivariant information extracted from semi-empirical calculations in order to construct novel equivariant Δ-machine learning methods.
Furthermore, in order to facilitate the automated deployment of multireference calculations, we aim to machine-learn DMRG-based orbital entropies in order to enable computationally cheaper automated active space selection. An application will be the analysis of the energetics of transition metal complex catalyzed reactions on a multireference level.
This research theme started in connection with my Marie Skłodowska-Curie postdoctoral fellowship funded by the European Union (“ML4Catalysis”, grant number 101025672).
Related publications:
- Hannes Kneiding, Ruslan Lukin, Lucas Lang, Simen Reine, Thomas Bondo Pedersen, Riccardo de Bin and David Balcells, Deep learning metal complex properties with natural quantum graphs, Digital Discovery 2, 618 (2023).
- Lucas Lang, Thomas Bondo Pedersen and David Balcells, Leveraging physical knowledge in ∆-machine learning with equivariant graph neural networks, in preparation.
- Lucas Lang, Maximilian Mörchen, Miguel Steiner, Thomas Weymuth, Markus Reiher, Thomas Bondo Pedersen and David Balcells, Machine learning in the reaction network of a homogeneous catalyst, in preparation.