Suppr超能文献

用于机器学习原子间势及其他方面的数据生成。

Data Generation for Machine Learning Interatomic Potentials and Beyond.

作者信息

Kulichenko Maksim, Nebgen Benjamin, Lubbers Nicholas, Smith Justin S, Barros Kipton, Allen Alice E A, Habib Adela, Shinkle Emily, Fedik Nikita, Li Ying Wai, Messerly Richard A, Tretiak Sergei

机构信息

Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.

Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.

出版信息

Chem Rev. 2024 Dec 25;124(24):13681-13714. doi: 10.1021/acs.chemrev.4c00572. Epub 2024 Nov 21.

Abstract

The field of data-driven chemistry is undergoing an evolution, driven by innovations in machine learning models for predicting molecular properties and behavior. Recent strides in ML-based interatomic potentials have paved the way for accurate modeling of diverse chemical and structural properties at the atomic level. The key determinant defining MLIP reliability remains the quality of the training data. A paramount challenge lies in constructing training sets that capture specific domains in the vast chemical and structural space. This Review navigates the intricate landscape of essential components and integrity of training data that ensure the extensibility and transferability of the resulting models. We delve into the details of active learning, discussing its various facets and implementations. We outline different types of uncertainty quantification applied to atomistic data acquisition and the correlations between estimated uncertainty and true error. The role of atomistic data samplers in generating diverse and informative structures is highlighted. Furthermore, we discuss data acquisition via modified and surrogate potential energy surfaces as an innovative approach to diversify training data. The Review also provides a list of publicly available data sets that cover essential domains of chemical space.

摘要

数据驱动化学领域正在经历一场变革,这是由用于预测分子性质和行为的机器学习模型的创新所推动的。基于机器学习的原子间势的最新进展为在原子水平上精确建模各种化学和结构性质铺平了道路。定义机器学习原子间势可靠性的关键决定因素仍然是训练数据的质量。一个至关重要的挑战在于构建能够在广阔的化学和结构空间中捕捉特定领域的训练集。本综述探讨了确保所得模型的可扩展性和可转移性的训练数据的基本组成部分和完整性的复杂情况。我们深入研究主动学习的细节,讨论其各个方面和实现方式。我们概述了应用于原子数据采集的不同类型的不确定性量化以及估计的不确定性与真实误差之间的相关性。强调了原子数据采样器在生成多样且信息丰富的结构方面的作用。此外,我们讨论了通过修改后的和替代的势能面进行数据采集,这是一种使训练数据多样化的创新方法。本综述还提供了一份涵盖化学空间基本领域的公开可用数据集列表。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f07/11672690/b7f8b3bb38ca/cr4c00572_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验