Suppr超能文献

原子模拟中的无监督机器学习:预测与理解之间

Unsupervised machine learning in atomistic simulations, between predictions and understanding.

作者信息

Ceriotti Michele

机构信息

Laboratory of Computational Science and Modeling, Institute des Materiaux, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland.

出版信息

J Chem Phys. 2019 Apr 21;150(15):150901. doi: 10.1063/1.5091842.

Abstract

Automated analyses of the outcome of a simulation have been an important part of atomistic modeling since the early days, addressing the need of linking the behavior of individual atoms and the collective properties that are usually the final quantity of interest. Methods such as clustering and dimensionality reduction have been used to provide a simplified, coarse-grained representation of the structure and dynamics of complex systems from proteins to nanoparticles. In recent years, the rise of machine learning has led to an even more widespread use of these algorithms in atomistic modeling and to consider different classification and inference techniques as part of a coherent toolbox of data-driven approaches. This perspective briefly reviews some of the unsupervised machine-learning methods-that are geared toward classification and coarse-graining of molecular simulations-seen in relation to the fundamental mathematical concepts that underlie all machine-learning techniques. It discusses the importance of using concise yet complete representations of atomic structures as the starting point of the analyses and highlights the risk of introducing preconceived biases when using machine learning to rationalize and understand structure-property relations. Supervised machine-learning techniques that explicitly attempt to predict the properties of a material given its structure are less susceptible to such biases. Current developments in the field suggest that using these two classes of approaches side-by-side and in a fully integrated mode, while keeping in mind the relations between the data analysis framework and the fundamental physical principles, will be key to realizing the full potential of machine learning to help understand the behavior of complex molecules and materials.

摘要

自早期以来,对模拟结果的自动化分析一直是原子模型的重要组成部分,满足了将单个原子的行为与通常作为最终关注量的集体性质联系起来的需求。诸如聚类和降维等方法已被用于提供从蛋白质到纳米颗粒等复杂系统的结构和动力学的简化、粗粒度表示。近年来,机器学习的兴起导致这些算法在原子模型中得到更广泛的应用,并将不同的分类和推理技术视为数据驱动方法的连贯工具箱的一部分。本观点简要回顾了一些无监督机器学习方法,这些方法旨在对分子模拟进行分类和粗粒度处理,并结合了所有机器学习技术所基于的基本数学概念进行阐述。它讨论了使用简洁而完整的原子结构表示作为分析起点的重要性,并强调了在使用机器学习来合理化和理解结构-性质关系时引入先入为主偏差的风险。明确尝试根据材料结构预测其性质的监督机器学习技术较不易受到此类偏差的影响。该领域的当前发展表明,将这两类方法并行且以完全集成的方式使用,同时牢记数据分析框架与基本物理原理之间的关系,将是充分发挥机器学习潜力以帮助理解复杂分子和材料行为的关键。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验