Zhuang Zixin, Barnard Amanda S
School of Computing, Australian National University, 145 Science Road, Acton, 2601, ACT, Australia.
J Cheminform. 2024 Apr 26;16(1):47. doi: 10.1186/s13321-024-00836-x.
Machine learning is a valuable tool that can accelerate the discovery and design of materials occupying combinatorial chemical spaces. However, the prerequisite need for vast amounts of training data can be prohibitive when significant resources are needed to characterize or simulate candidate structures. Recent results have shown that structure-free encoding of complex materials, based entirely on chemical compositions, can overcome this impediment and perform well in unsupervised learning tasks. In this study, we extend this exploration to supervised classification, and show how structure-free encoding can accurately predict classes of material compounds for battery applications without time consuming measurement of bonding networks, lattices or densities. SCIENTIFIC CONTRIBUTION: The comprehensive evaluation of structure-free encodings of complex materials in classification tasks, including binary and multi-class separation, inclusive of three classifiers based on different logic function, is measured four metrics and learning curves. The encoding is applied to two data sets from computational and experimental sources, and the outcomes visualised using 5 approaches to confirms the suitability and superiority of Mendeleev encoding. These methods are general and accessible using source software, to provide simple, intuitive and interpretable materials informatics outcomes to accelerate materials design.
机器学习是一种有价值的工具,可加速对占据组合化学空间的材料的发现和设计。然而,当需要大量资源来表征或模拟候选结构时,对大量训练数据的前提需求可能会令人望而却步。最近的结果表明,完全基于化学成分的复杂材料的无结构编码可以克服这一障碍,并在无监督学习任务中表现良好。在本研究中,我们将这一探索扩展到监督分类,并展示无结构编码如何能够准确预测用于电池应用的材料化合物类别,而无需耗时测量键合网络、晶格或密度。科学贡献:在分类任务中对复杂材料的无结构编码进行全面评估,包括二元和多类分离,涵盖基于不同逻辑函数的三个分类器,通过四个指标和学习曲线进行衡量。该编码应用于来自计算和实验来源的两个数据集,并使用5种方法对结果进行可视化,以确认门捷列夫编码的适用性和优越性。这些方法具有通用性,可通过源软件获取,以提供简单、直观且可解释的材料信息学结果,从而加速材料设计。