Suppr超能文献

机器学习中特征构建对相变的重要性。

Importance of feature construction in machine learning for phase transitions.

作者信息

Jang Inhyuk, Kaur Supreet, Yethiraj Arun

机构信息

Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.

出版信息

J Chem Phys. 2022 Sep 7;157(9):094904. doi: 10.1063/5.0102187.

Abstract

Machine learning is an important tool in the study of the phase behavior from molecular simulations. In this work, we use un-supervised machine learning methods to study the phase behavior of two off-lattice models, a binary Lennard-Jones (LJ) mixture and the Widom-Rowlinson (WR) non-additive hard-sphere mixture. The majority of previous work has focused on lattice models, such as the 2D Ising model, where the values of the spins are used as the feature vector that is input into the machine learning algorithm, with considerable success. For these two off-lattice models, we find that the choice of the feature vector is crucial to the ability of the algorithm to predict a phase transition, and this depends on the particular model system being studied. We consider two feature vectors, one where the elements are distances of the particles of a given species from a probe (distance-based feature) and one where the elements are +1 if there is an excess of particles of the same species within a cut-off distance and -1 otherwise (affinity-based feature). We use principal component analysis and t-distributed stochastic neighbor embedding to investigate the phase behavior at a critical composition. We find that the choice of the feature vector is the key to the success of the unsupervised machine learning algorithm in predicting the phase behavior, and the sophistication of the machine learning algorithm is of secondary importance. In the case of the LJ mixture, both feature vectors are adequate to accurately predict the critical point, but in the case of the WR mixture, the affinity-based feature vector provides accurate estimates of the critical point, but the distance-based feature vector does not provide a clear signature of the phase transition. The study suggests that physical insight into the choice of input features is an important aspect for implementing machine learning methods.

摘要

机器学习是分子模拟相行为研究中的一项重要工具。在本工作中,我们使用无监督机器学习方法来研究两种非晶格模型的相行为,即二元 Lennard-Jones(LJ)混合物和 Widom-Rowlinson(WR)非加和硬球混合物。此前的大多数工作都集中在晶格模型上,例如二维伊辛模型,其中自旋值被用作输入到机器学习算法中的特征向量,并取得了相当大的成功。对于这两种非晶格模型,我们发现特征向量的选择对于算法预测相变的能力至关重要,这取决于所研究的特定模型系统。我们考虑了两种特征向量,一种特征向量的元素是给定物种的粒子到一个探针的距离(基于距离的特征),另一种特征向量的元素如果在截止距离内同一物种的粒子过量则为 +1,否则为 -1(基于亲和力的特征)。我们使用主成分分析和 t 分布随机邻域嵌入来研究临界组成下的相行为。我们发现特征向量的选择是无监督机器学习算法成功预测相行为的关键,而机器学习算法的复杂性则是次要的。在 LJ 混合物的情况下,两种特征向量都足以准确预测临界点,但在 WR 混合物的情况下,基于亲和力的特征向量能提供临界点的准确估计,而基于距离的特征向量则无法提供相变的清晰特征。该研究表明,对输入特征选择的物理洞察是实施机器学习方法的一个重要方面。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验