Croy Alexander
Institute of Physical Chemistry, Friedrich Schiller University Jena, 07737 Jena, Germany.
ACS Omega. 2024 Apr 24;9(18):20616-20622. doi: 10.1021/acsomega.4c02770. eCollection 2024 May 7.
The similarity of local atomic environments is an important concept in many machine learning techniques, which find applications in computational chemistry and material science. Here, we present and discuss a connection between the information entropy and the similarity matrix of a molecule. The resulting entropy can be used as a measure of the complexity of a molecule. Exemplarily, we introduce and evaluate two specific choices for defining the similarity: one is based on a SMILES representation of local substructures, and the other is based on the SOAP kernel. By tuning the sensitivity of the latter, we can achieve good agreement between the respective entropies. Finally, we consider the entropy of two molecules in a mixture. The gain of entropy due to the mixing can be used as a similarity measure of the molecules. We compare this measure to the average and best-match kernel. The results indicate a connection between the different approaches and demonstrate the usefulness and broad applicability of the similarity-based entropy approach.
局部原子环境的相似性是许多机器学习技术中的一个重要概念,这些技术在计算化学和材料科学中有应用。在这里,我们提出并讨论信息熵与分子相似性矩阵之间的联系。由此产生的熵可以用作衡量分子复杂性的指标。作为示例,我们介绍并评估了两种定义相似性的具体选择:一种基于局部子结构的SMILES表示,另一种基于SOAP核。通过调整后者的灵敏度,我们可以在各自的熵之间取得良好的一致性。最后,我们考虑混合物中两个分子的熵。混合导致的熵增可以用作分子的相似性度量。我们将此度量与平均核和最佳匹配核进行比较。结果表明了不同方法之间的联系,并证明了基于相似性的熵方法的有用性和广泛适用性。