Roncoroni Fabrice, Sanz-Matias Ana, Sundararaman Siddharth, Prendergast David
Joint Center for Energy Storage Research, The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
Phys Chem Chem Phys. 2023 May 17;25(19):13741-13754. doi: 10.1039/d3cp00525a.
Molecular dynamics (MD) simulations present a data-mining challenge, given that they can generate a considerable amount of data but often rely on limited or biased human interpretation to examine their information content. By not asking the right questions of MD data we may miss critical information hidden within it. We combine dimensionality reduction (UMAP) and unsupervised hierarchical clustering (HDBSCAN) to quantitatively characterize prevalent coordination environments of chemical species within MD data. By focusing on local coordination, we significantly reduce the amount of data to be analyzed by extracting all distinct molecular formulas within a given coordination sphere. We then efficiently combine UMAP and HDBSCAN with alignment or shape-matching algorithms to partition these formulas into structural isomer families indicating their relative populations. The method was employed to reveal details of cation coordination in electrolytes based on molecular liquids.
分子动力学(MD)模拟带来了数据挖掘方面的挑战,因为它们能够生成大量数据,但往往依赖有限或有偏差的人工解读来审视其信息内容。如果没有对MD数据提出正确的问题,我们可能会错过隐藏在其中的关键信息。我们将降维(UMAP)和无监督层次聚类(HDBSCAN)相结合,以定量表征MD数据中化学物种的普遍配位环境。通过关注局部配位,我们通过提取给定配位球体内所有不同的分子式,显著减少了待分析的数据量。然后,我们有效地将UMAP和HDBSCAN与比对或形状匹配算法相结合,将这些分子式划分为结构异构体家族,表明它们的相对丰度。该方法被用于揭示基于分子液体的电解质中阳离子配位的细节。