Suppr超能文献

基于空洞空间金字塔网络的深度集成学习用于蛋白质二级结构预测

Deep Ensemble Learning with Atrous Spatial Pyramid Networks for Protein Secondary Structure Prediction.

作者信息

Guo Yuzhi, Wu Jiaxiang, Ma Hehuan, Wang Sheng, Huang Junzhou

机构信息

Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA.

AI Lab, Tencent, Shenzhen 508929, China.

出版信息

Biomolecules. 2022 Jun 2;12(6):774. doi: 10.3390/biom12060774.

Abstract

The secondary structure of proteins is significant for studying the three-dimensional structure and functions of proteins. Several models from image understanding and natural language modeling have been successfully adapted in the protein sequence study area, such as Long Short-term Memory (LSTM) network and Convolutional Neural Network (CNN). Recently, Gated Convolutional Neural Network (GCNN) has been proposed for natural language processing. It has achieved high levels of sentence scoring, as well as reduced the latency. Conditionally Parameterized Convolution (CondConv) is another novel study which has gained great success in the image processing area. Compared with vanilla CNN, CondConv uses extra sample-dependant modules to conditionally adjust the convolutional network. In this paper, we propose a novel Conditionally Parameterized Convolutional network (CondGCNN) which utilizes the power of both CondConv and GCNN. CondGCNN leverages an ensemble encoder to combine the capabilities of both LSTM and CondGCNN to encode protein sequences by better capturing protein sequential features. In addition, we explore the similarity between the secondary structure prediction problem and the image segmentation problem, and propose an ASP network (Atrous Spatial Pyramid Pooling (ASPP) based network) to capture fine boundary details in secondary structure. Extensive experiments show that the proposed method can achieve higher performance on protein secondary structure prediction task than existing methods on CB513, Casp11, CASP12, CASP13, and CASP14 datasets. We also conducted ablation studies over each component to verify the effectiveness. Our method is expected to be useful for any protein related prediction tasks, which is not limited to protein secondary structure prediction.

摘要

蛋白质的二级结构对于研究蛋白质的三维结构和功能具有重要意义。图像理解和自然语言建模中的几种模型已成功应用于蛋白质序列研究领域,如长短期记忆(LSTM)网络和卷积神经网络(CNN)。最近,门控卷积神经网络(GCNN)被提出用于自然语言处理。它在句子评分方面取得了很高的水平,并减少了延迟。条件参数化卷积(CondConv)是另一项在图像处理领域取得巨大成功的新颖研究。与普通CNN相比,CondConv使用额外的样本相关模块来有条件地调整卷积网络。在本文中,我们提出了一种新颖的条件参数化卷积网络(CondGCNN),它利用了CondConv和GCNN的优势。CondGCNN利用一个集成编码器来结合LSTM和CondGCNN的能力,通过更好地捕捉蛋白质序列特征来编码蛋白质序列。此外,我们探索了二级结构预测问题与图像分割问题之间的相似性,并提出了一个基于空洞空间金字塔池化(ASPP)的网络(ASP网络)来捕捉二级结构中的精细边界细节。大量实验表明,所提出的方法在CB513、Casp11、CASP12、CASP13和CASP14数据集上的蛋白质二级结构预测任务中比现有方法具有更高的性能。我们还对每个组件进行了消融研究以验证其有效性。我们的方法有望对任何与蛋白质相关的预测任务有用,而不仅限于蛋白质二级结构预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5484/9221033/47ab3039fb28/biomolecules-12-00774-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验