Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, Lucknow, Uttar Pradesh, India.
Computer and Information Sciences Department, Universiti Teknologi Petronas, 32610, Seri Iskander, Perak, Malaysia.
Environ Sci Pollut Res Int. 2021 Sep;28(34):47641-47650. doi: 10.1007/s11356-021-14028-9. Epub 2021 Apr 24.
We are exposed to various chemical compounds present in the environment, cosmetics, and drugs almost every day. Mutagenicity is a valuable property that plays a significant role in establishing a chemical compound's safety. Exposure and handling of mutagenic chemicals in the environment pose a high health risk; therefore, identification and screening of these chemicals are essential. Considering the time constraints and the pressure to avoid laboratory animals' use, the shift to alternative methodologies that can establish a rapid and cost-effective detection without undue over-conservation seems critical. In this regard, computational detection and identification of the mutagens in environmental samples like drugs, pesticides, dyes, reagents, wastewater, cosmetics, and other substances is vital. From the last two decades, there have been numerous efforts to develop the prediction models for mutagenicity, and by far, machine learning methods have demonstrated some noteworthy performance and reliability. However, the accuracy of such prediction models has always been one of the major concerns for the researchers working in this area. The mutagenicity prediction models were developed using deep neural network (DNN), support vector machine, k-nearest neighbor, and random forest. The developed classifiers were based on 3039 compounds and validated on 1014 compounds; each of them encoded with 1597 molecular feature vectors. DNN-based prediction model yielded highest prediction accuracy of 92.95% and 83.81% with the training and test data, respectively. The area under the receiver's operating curve and precision-recall curve values were found to be 0.894 and 0.838, respectively. The DNN-based classifier not only fits the data with better performance as compared to traditional machine learning algorithms, viz., support vector machine, k-nearest neighbor, and random forest (with and without feature reduction) but also yields better performance metrics. In current work, we propose a DNN-based model to predict mutagenicity of compounds.
我们几乎每天都会接触到环境、化妆品和药物中存在的各种化学化合物。致突变性是一种有价值的特性,它在确定化学化合物的安全性方面起着重要作用。暴露于环境中的致突变性化学物质并处理这些化学物质会带来很高的健康风险;因此,识别和筛选这些化学物质是必不可少的。考虑到时间限制和避免使用实验室动物的压力,转向替代方法似乎至关重要,这些方法可以在不过度保守的情况下快速且经济高效地进行检测。在这方面,对环境样品(如药物、农药、染料、试剂、废水、化妆品和其他物质)中的致突变物进行计算检测和识别至关重要。在过去的二十年中,已经有许多努力来开发致突变性预测模型,到目前为止,机器学习方法已经表现出一些值得注意的性能和可靠性。然而,此类预测模型的准确性一直是该领域研究人员关注的主要问题之一。使用深度神经网络 (DNN)、支持向量机、k-最近邻和随机森林开发了致突变性预测模型。开发的分类器基于 3039 种化合物,并在 1014 种化合物上进行了验证;它们中的每一种都用 1597 个分子特征向量进行了编码。基于 DNN 的预测模型在训练和测试数据上的预测准确率分别高达 92.95%和 83.81%。接收器工作曲线和精度-召回曲线的面积分别为 0.894 和 0.838。与传统机器学习算法(如支持向量机、k-最近邻和随机森林(带或不带特征减少))相比,基于 DNN 的分类器不仅具有更好的性能拟合数据,而且还产生了更好的性能指标。在当前的工作中,我们提出了一种基于 DNN 的模型来预测化合物的致突变性。