Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada.
Department of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada.
Acc Chem Res. 2021 Feb 16;54(4):849-860. doi: 10.1021/acs.accounts.0c00785. Epub 2021 Feb 2.
The ongoing revolution of the natural sciences by the advent of machine learning and artificial intelligence sparked significant interest in the material science community in recent years. The intrinsically high dimensionality of the space of realizable materials makes traditional approaches ineffective for large-scale explorations. Modern data science and machine learning tools developed for increasingly complicated problems are an attractive alternative. An imminent climate catastrophe calls for a clean energy transformation by overhauling current technologies within only several years of possible action available. Tackling this crisis requires the development of new materials at an unprecedented pace and scale. For example, organic photovoltaics have the potential to replace existing silicon-based materials to a large extent and open up new fields of application. In recent years, organic light-emitting diodes have emerged as state-of-the-art technology for digital screens and portable devices and are enabling new applications with flexible displays. Reticular frameworks allow the atom-precise synthesis of nanomaterials and promise to revolutionize the field by the potential to realize multifunctional nanoparticles with applications from gas storage, gas separation, and electrochemical energy storage to nanomedicine. In the recent decade, significant advances in all these fields have been facilitated by the comprehensive application of simulation and machine learning for property prediction, property optimization, and chemical space exploration enabled by considerable advances in computing power and algorithmic efficiency.In this Account, we review the most recent contributions of our group in this thriving field of machine learning for material science. We start with a summary of the most important material classes our group has been involved in, focusing on small molecules as organic electronic materials and crystalline materials. Specifically, we highlight the data-driven approaches we employed to speed up discovery and derive material design strategies. Subsequently, our focus lies on the data-driven methodologies our group has developed and employed, elaborating on high-throughput virtual screening, inverse molecular design, Bayesian optimization, and supervised learning. We discuss the general ideas, their working principles, and their use cases with examples of successful implementations in data-driven material discovery and design efforts. Furthermore, we elaborate on potential pitfalls and remaining challenges of these methods. Finally, we provide a brief outlook for the field as we foresee increasing adaptation and implementation of large scale data-driven approaches in material discovery and design campaigns.
近年来,机器学习和人工智能的出现引发了自然科学的革命,这引起了材料科学界的极大兴趣。可实现材料的空间内在的高维度使得传统方法在大规模探索中无效。为日益复杂的问题开发的现代数据科学和机器学习工具是一种有吸引力的替代方案。即将发生的气候灾难要求在可能的行动仅有几年的时间内彻底改革当前技术,以实现清洁能源转型。应对这场危机需要以前所未有的速度和规模开发新材料。例如,有机光伏在很大程度上有可能取代现有的硅基材料,并开辟新的应用领域。近年来,有机发光二极管已成为数字屏幕和便携式设备的最先进技术,并通过具有从气体存储、气体分离和电化学储能到纳米医学等应用的多功能纳米粒子的实现潜力,为新的应用打开了大门。网状框架允许原子精确合成纳米材料,并有望通过在计算能力和算法效率方面取得的重大进展,实现多功能纳米粒子的应用,从而彻底改变该领域。在过去的十年中,所有这些领域的重大进展都得益于模拟和机器学习在属性预测、属性优化和化学空间探索方面的全面应用,这得益于计算能力和算法效率的显著提高。在本报告中,我们回顾了我们小组在这个蓬勃发展的机器学习材料科学领域的最新贡献。我们首先总结了我们小组参与的最重要的材料类别,重点关注作为有机电子材料和结晶材料的小分子。具体来说,我们强调了我们用来加速发现和得出材料设计策略的数据驱动方法。随后,我们将重点放在我们小组开发和采用的数据驱动方法上,详细介绍高通量虚拟筛选、逆分子设计、贝叶斯优化和监督学习。我们讨论了这些方法的一般思路、工作原理及其用例,并提供了成功实施数据驱动材料发现和设计工作的示例。此外,我们详细介绍了这些方法的潜在陷阱和遗留挑战。最后,我们对该领域进行了简要展望,预计在材料发现和设计活动中会越来越多地采用和实施大规模数据驱动方法。