Department of Chemical Engineering, Tokyo University of Agriculture and Technology, Japan.
Research Institute of Science for Safety and Sustainability, National Institute of Advanced Industrial Science and Technology (AIST), Japan.
Chemosphere. 2020 Jan;238:124604. doi: 10.1016/j.chemosphere.2019.124604. Epub 2019 Aug 16.
Accurate in silico predictions of chemical substance ecotoxicity has become an important issue in recent years. Most conventional methods, such as the Ecological Structure-Activity Relationship (ECOSAR) model, cluster chemical substances empirically based on structural information and then predict toxicity by employing a log P linear regression model. Due to empirical classification, the prediction accuracy does not improve even if new ecotoxicity test data are added. In addition, most of the conventional methods are not appropriate for predicting the ecotoxicity on inorganic and/or ionized compounds. Furthermore, a user faces difficulty in handling multiple Quantitative Structure-Activity Relationship (QSAR) formulas with one chemical substance. To overcome the flaws of the conventional methods, in this study a new method was developed that applied unsupervised machine learning and graph theory to predict acute ecotoxicity. The proposed machine learning technique is based on the large AIST-MeRAM ecotoxicity test dataset, a software program developed by the National Institute of Advanced Industry Science and Technology for Multi-purpose Ecological Risk Assessment and Management, and the Molecular ACCess System (MACCS) keys that vectorize a chemical structure to 166-bit binary information. The acute toxicity of fish, daphnids, and algae can be predicted with good accuracy, without requiring log P and linear regression models in existing methods. Results from the new method were cross-validated and compared with ECOSAR predictions and show that the new method provides better accuracy for a wider range of chemical substances, including inorganic and ionized compounds.
近年来,准确地对化学物质的生态毒性进行计算机预测已成为一个重要问题。大多数传统方法,如生态结构-活性关系(ECOSAR)模型,都是根据结构信息对化学物质进行经验性聚类,然后通过对数 P 线性回归模型来预测毒性。由于经验分类,即使添加了新的生态毒性测试数据,预测准确性也不会提高。此外,大多数传统方法都不适合预测无机和/或离子化合物的生态毒性。此外,用户在处理一个化学物质的多个定量结构-活性关系(QSAR)公式时会遇到困难。为了克服传统方法的缺陷,本研究开发了一种新的方法,该方法将无监督机器学习和图论应用于预测急性生态毒性。所提出的机器学习技术基于 AIST-MeRAM 生态毒性测试数据集,该数据集由日本国立先进工业科学技术研究所(National Institute of Advanced Industry Science and Technology)开发,用于多功能生态风险评估和管理,以及分子可接近性系统(MACCS)键,该键将化学结构矢量化为 166 位二进制信息。可以很好地准确预测鱼类、水蚤和藻类的急性毒性,而无需现有方法中的对数 P 和线性回归模型。新方法的交叉验证结果与 ECOSAR 预测结果进行了比较,表明新方法对更广泛的化学物质,包括无机和离子化合物,提供了更好的准确性。