Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA; email:
eScience Institute, University of Washington, Seattle, Washington 98195, USA.
Annu Rev Chem Biomol Eng. 2021 Jun 7;12:15-37. doi: 10.1146/annurev-chembioeng-101220-102232. Epub 2021 Mar 12.
Chemical engineering is being rapidly transformed by the tools of data science. On the horizon, artificial intelligence (AI) applications will impact a huge swath of our work, ranging from the discovery and design of new molecules to operations and manufacturing and many areas in between. Early adoption of data science, machine learning, and early examples of AI in chemical engineering has been rich with examples of molecular data science-the application tools for molecular discovery and property optimization at the atomic scale. We summarize key advances in this nascent subfield while introducing molecular data science for a broad chemical engineering readership. We introduce the field through the concept of a molecular data science life cycle and discuss relevant aspects of five distinct phases of this process: creation of curated data sets, molecular representations, data-driven property prediction, generation of new molecules, and feasibility and synthesizability considerations.
化学工程正迅速被数据科学的工具所改变。在不远的将来,人工智能 (AI) 应用将影响我们工作的很大一部分,从新分子的发现和设计到运营和制造以及两者之间的许多领域。在化学工程中早期采用数据科学、机器学习和早期人工智能的例子,为分子发现和原子尺度上的属性优化提供了丰富的应用工具,即分子数据科学。我们总结了这个新兴子领域的关键进展,同时为广大化学工程读者介绍了分子数据科学。我们通过分子数据科学生命周期的概念来介绍这个领域,并讨论这个过程五个不同阶段的相关方面:有组织数据集的创建、分子表示、数据驱动的属性预测、新分子的生成以及可行性和可合成性的考虑。