Dobbelaere Maarten R, Lengyel István, Stevens Christian V, Van Geem Kevin M
Laboratory for Chemical Technology, Department of Materials, Textiles and Chemical Engineering, Faculty of Engineering and Architecture, Ghent University, Technologiepark 125, 9052, Ghent, Belgium.
ChemInsights LLC, Dover, DE, 19901, USA.
J Cheminform. 2024 Aug 13;16(1):99. doi: 10.1186/s13321-024-00895-0.
Chemical engineers heavily rely on precise knowledge of physicochemical properties to model chemical processes. Despite the growing popularity of deep learning, it is only rarely applied for property prediction due to data scarcity and limited accuracy for compounds in industrially-relevant areas of the chemical space. Herein, we present a geometric deep learning framework for predicting gas- and liquid-phase properties based on novel quantum chemical datasets comprising 124,000 molecules. Our findings reveal that the necessity for quantum-chemical information in deep learning models varies significantly depending on the modeled physicochemical property. Specifically, our top-performing geometric model meets the most stringent criteria for "chemically accurate" thermochemistry predictions. We also show that by carefully selecting the appropriate model featurization and evaluating prediction uncertainties, the reliability of the predictions can be strongly enhanced. These insights represent a crucial step towards establishing deep learning as the standard property prediction workflow in both industry and academia.Scientific contributionWe propose a flexible property prediction tool that can handle two-dimensional and three-dimensional molecular information. A thermochemistry prediction methodology that achieves high-level quantum chemistry accuracy for a broad application range is presented. Trained deep learning models and large novel molecular databases of real-world molecules are provided to offer a directly usable and fast property prediction solution to practitioners.
化学工程师在很大程度上依赖于物理化学性质的精确知识来对化学过程进行建模。尽管深度学习越来越受欢迎,但由于数据稀缺以及在化学空间中与工业相关的区域内化合物的准确性有限,它很少用于性质预测。在此,我们提出了一种基于包含124,000个分子的新型量子化学数据集来预测气相和液相性质的几何深度学习框架。我们的研究结果表明,深度学习模型中量子化学信息的必要性根据所建模的物理化学性质而有显著差异。具体而言,我们表现最佳的几何模型满足“化学精确”热化学预测的最严格标准。我们还表明,通过仔细选择合适的模型特征化并评估预测不确定性,可以大大提高预测的可靠性。这些见解是朝着将深度学习确立为工业界和学术界标准性质预测工作流程迈出的关键一步。
科学贡献
我们提出了一种灵活的性质预测工具,它可以处理二维和三维分子信息。提出了一种热化学预测方法,该方法在广泛的应用范围内实现了高水平的量子化学准确性。提供了经过训练的深度学习模型和真实世界分子的大型新型分子数据库,为从业者提供直接可用且快速的性质预测解决方案。