Suppr超能文献

基于二维指纹的卷积神经网络模型用于生物活性预测。

Convolutional Neural Network Model Based on 2D Fingerprint for Bioactivity Prediction.

机构信息

Laboratory of Advanced Electronics Systems (LSEA), University of Medea, Medea 26000, Algeria.

UTM Big Data Centre, Ibnu Sina Institute for Scientific and Industrial Research, Universiti Teknologi Malaysia, Johor Bahru 81310, Johor, Malaysia.

出版信息

Int J Mol Sci. 2022 Oct 30;23(21):13230. doi: 10.3390/ijms232113230.

Abstract

Determining and modeling the possible behaviour and actions of molecules requires investigating the basic structural features and physicochemical properties that determine their behaviour during chemical, physical, biological, and environmental processes. Computational approaches such as machine learning methods are alternatives to predicting the physiochemical properties of molecules based on their structures. However, the limited accuracy and high error rates of such predictions restrict their use. In this paper, a novel technique based on a deep learning convolutional neural network (CNN) for the prediction of chemical compounds' bioactivity is proposed and developed. The molecules are represented in the new matrix format Mol2mat, a molecular matrix representation adapted from the well-known 2D-fingerprint descriptors. To evaluate the performance of the proposed methods, a series of experiments were conducted using two standard datasets, namely the MDL Drug Data Report (MDDR) and Sutherland, datasets comprising 10 homogeneous and 14 heterogeneous activity classes. After analysing the eight fingerprints, all the probable combinations were investigated using the five best descriptors. The results showed that a combination of three fingerprints, ECFP4, EPFP4, and ECFC4, along with a CNN activity prediction process, achieved the highest performance of 98% AUC when compared to the state-of-the-art ML algorithms NaiveB, LSVM, and RBFN.

摘要

确定和模拟分子的可能行为和活动需要研究基本的结构特征和物理化学性质,这些特征和性质决定了它们在化学、物理、生物和环境过程中的行为。基于机器学习方法的计算方法是根据分子结构预测分子物理化学性质的替代方法。然而,这些预测的准确性有限,错误率较高,限制了它们的使用。本文提出并开发了一种基于深度学习卷积神经网络(CNN)的化合物生物活性预测新方法。分子以 Mol2mat 新矩阵格式表示,这是一种从著名的 2D-指纹描述符改编而来的分子矩阵表示。为了评估所提出方法的性能,使用两个标准数据集,即 MDL 药物数据报告(MDDR)和 Sutherland 数据集,进行了一系列实验,这些数据集包含 10 个同质和 14 个异质活性类别的实验。在分析了八个指纹后,使用五个最佳描述符研究了所有可能的组合。结果表明,与最先进的 ML 算法 NaiveB、LSVM 和 RBFN 相比,ECFP4、EPFP4 和 ECFC4 三种指纹与 CNN 活性预测过程相结合,可实现最高 98% AUC 的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdfe/9657591/3c8ffce6281e/ijms-23-13230-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验