Tuan-Anh Tran, Zaleśny Robert
Oxford University Clinical Research Unit, Wellcome Trust Major Overseas Programme Viet Nam, 764 Vo Van Kiet, Quan 5, Ho Chi Minh City, Vietnam.
Department of Physical and Quantum Chemistry, Faculty of Chemistry, Wrocław University of Science and Technology, Wyb. Wyspiańskiego 27, PL-50370 Wrocław, Poland.
ACS Omega. 2020 Mar 9;5(10):5318-5325. doi: 10.1021/acsomega.9b04339. eCollection 2020 Mar 17.
There is an exigency of adopting machine learning techniques to screen and discover new materials which could address many societal and technological challenges. In this work, we follow this trend and employ machine learning to study (high-order) electric properties of organic compounds. The results of quantum-chemistry calculations of polarizability and first hyperpolarizability, obtained for more than 50,000 compounds, served as targets for machine learning-based predictions. The studied set of molecular structures encompasses organic push-pull molecules with variable linker lengths. Moreover, the diversified set of linkers, composed of alternating single/double and single/triple carbon-carbon bonds, was considered. This study demonstrates that the applied machine learning strategy allows us to obtain the correlation coefficients, between predicted and reference values of (hyper)polarizabilities, exceeding 0.9 on training, validation, and test set. However, in order to achieve such satisfactory predictive power, one needs to choose the training set appropriately, as the machine learning methods are very sensitive to the linker-type diversity in the training set, yielding catastrophic predictions in certain cases. Furthermore, the dependence of (hyper)polarizability on the length of spacers was studied in detail, allowing for explanation of the appreciably high accuracy of employed approaches.
采用机器学习技术来筛选和发现能够应对诸多社会和技术挑战的新材料已成为当务之急。在这项工作中,我们顺应这一趋势,运用机器学习来研究有机化合物的(高阶)电学性质。对50000多种化合物进行的极化率和一阶超极化率的量子化学计算结果,作为基于机器学习预测的目标。所研究的分子结构集包括具有可变连接链长度的有机推拉分子。此外,还考虑了由交替的单/双键和单/三键碳 - 碳键组成的多样化连接链集。这项研究表明,所应用的机器学习策略使我们能够在训练集、验证集和测试集上获得(超)极化率预测值与参考值之间的相关系数超过0.9。然而,为了达到如此令人满意的预测能力,需要适当地选择训练集,因为机器学习方法对训练集中连接链类型的多样性非常敏感,在某些情况下会产生灾难性的预测。此外,还详细研究了(超)极化率对间隔基团长度的依赖性,从而能够解释所采用方法具有相当高准确性的原因。