CAT-CPI:结合卷积神经网络(CNN)和变压器(Transformer)学习化合物图像特征以预测化合物-蛋白质相互作用
CAT-CPI: Combining CNN and transformer to learn compound image features for predicting compound-protein interactions.
作者信息
Qian Ying, Wu Jian, Zhang Qian
机构信息
Shanghai Frontiers Science Center of Molecule Intelligent Syntheses, School of Computer Science and Technology, East China Normal University, Shanghai, China.
出版信息
Front Mol Biosci. 2022 Sep 15;9:963912. doi: 10.3389/fmolb.2022.963912. eCollection 2022.
Compound-protein interaction (CPI) prediction is a foundational task for drug discovery, which process is time-consuming and costly. The effectiveness of CPI prediction can be greatly improved using deep learning methods to accelerate drug development. Large number of recent research results in the field of computer vision, especially in deep learning, have proved that the position, geometry, spatial structure and other features of objects in an image can be well characterized. We propose a novel molecular image-based model named CAT-CPI (combining CNN and transformer to predict CPI) for CPI task. We use Convolution Neural Network (CNN) to learn local features of molecular images and then use transformer encoder to capture the semantic relationships of these features. To extract protein sequence feature, we propose to use a k-gram based method and obtain the semantic relationships of sub-sequences by transformer encoder. In addition, we build a Feature Relearning (FR) module to learn interaction features of compounds and proteins. We evaluated CAT-CPI on three benchmark datasets-Human, Celegans, and Davis-and the experimental results demonstrate that CAT-CPI presents competitive performance against state-of-the-art predictors. In addition, we carry out Drug-Drug Interaction (DDI) experiments to verify the strong potential of the methods based on molecular images and FR module.
复合蛋白相互作用(CPI)预测是药物发现的一项基础任务,其过程既耗时又昂贵。使用深度学习方法来加速药物开发可以大大提高CPI预测的有效性。计算机视觉领域,尤其是深度学习领域最近的大量研究成果证明,图像中物体的位置、几何形状、空间结构等特征可以得到很好的表征。我们针对CPI任务提出了一种名为CAT-CPI(结合卷积神经网络和Transformer来预测CPI)的基于分子图像的新型模型。我们使用卷积神经网络(CNN)来学习分子图像的局部特征,然后使用Transformer编码器来捕捉这些特征的语义关系。为了提取蛋白质序列特征,我们提出使用基于k-gram的方法,并通过Transformer编码器获得子序列的语义关系。此外,我们构建了一个特征再学习(FR)模块来学习化合物和蛋白质的相互作用特征。我们在三个基准数据集——人类、秀丽隐杆线虫和戴维斯数据集上对CAT-CPI进行了评估,实验结果表明,CAT-CPI与最先进的预测器相比具有竞争力。此外,我们进行了药物-药物相互作用(DDI)实验,以验证基于分子图像和FR模块的方法的强大潜力。
相似文献
IEEE/ACM Trans Comput Biol Bioinform. 2022
Interdiscip Sci. 2024-6
Math Biosci Eng. 2024-7-30
引用本文的文献
J Chem Inf Model. 2025-3-10
Brief Bioinform. 2023-11-22
Pharmaceuticals (Basel). 2023-9-6
本文引用的文献
BMC Bioinformatics. 2021-9-3
Bioinformatics. 2021-5-23
Bioinformatics. 2021-5-5
BMC Bioinformatics. 2020-7-6