Sharma Ashok K, Srivastava Gopal N, Roy Ankita, Sharma Vineet K
Metagenomics and Systems Biology Laboratory, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, India.
Front Pharmacol. 2017 Nov 30;8:880. doi: 10.3389/fphar.2017.00880. eCollection 2017.
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
预测分子毒性的实验方法是繁琐且耗时的任务。因此,可利用计算方法来开发毒性预测的替代方法。我们已经开发了一种工具,用于预测任何分子/代谢物的分子毒性以及水溶性和渗透性。使用一组全面且经过整理的毒素分子作为训练集,利用不同的基于化学和结构的特征(如描述符和指纹)进行特征选择、优化,并开发基于机器学习的分类和回归模型。毒素和非毒素之间原子分布的组成差异明显,因此,分子特征被用于分类和回归。在10折交叉验证中,基于描述符、基于指纹和基于混合的分类模型显示出相似的准确率(93%)和马修斯相关系数(0.84)。在盲数据集上,所有这三种模型的性能相当(马修斯相关系数 = 0.84 - 0.87)。此外,还在盲数据集上对以描述符作为输入特征的基于回归的模型进行了比较和评估。基于随机森林的溶解度预测回归模型( = 0.84)比多元线性回归(MLR)和偏最小二乘回归(PLSR)模型表现更好,而基于偏最小二乘的渗透性(caco - 2)预测回归模型与基于随机森林和MLR的回归模型相比表现更好( = 0.68)。使用包括已知毒素和保健品常用成分的两个验证数据集对最终分类和回归模型的性能进行了评估,这证明了其准确性。ToxiM网络服务器将是预测小分子毒性、溶解度和渗透性的非常有用且可靠的工具。