Suppr超能文献

无监督深度学习可从未比对序列中识别蛋白质功能基团。

Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences.

作者信息

David Kyle T, Halanych Kenneth M

机构信息

Department of Biological Sciences, Auburn University, Auburn, AL, USA.

Center for Marine Sciences, University of North Carolina Wilmington, NC, USA.

出版信息

Genome Biol Evol. 2023 May 22;15(5). doi: 10.1093/gbe/evad084.

Abstract

Interpreting protein function from sequence data is a fundamental goal of bioinformatics. However, our current understanding of protein diversity is bottlenecked by the fact that most proteins have only been functionally validated in model organisms, limiting our understanding of how function varies with gene sequence diversity. Thus, accuracy of inferences in clades without model representatives is questionable. Unsupervised learning may help to ameliorate this bias by identifying highly complex patterns and structure from large datasets without external labels. Here we present DeepSeqProt, an unsupervised deep learning program for exploring large protein sequence datasets. DeepSeqProt is a clustering tool capable of distinguishing between broad classes of proteins while learning local and global structure of functional space. DeepSeqProt is capable of learning salient biological features from unaligned, unannotated sequences. DeepSeqProt is more likely to capture complete protein families and statistically significant shared ontologies within proteomes than other clustering methods. We hope this framework will prove of use to researchers and provide a preliminary step in further developing unsupervised deep learning in molecular biology.

摘要

从序列数据中解读蛋白质功能是生物信息学的一个基本目标。然而,我们目前对蛋白质多样性的理解受到这样一个事实的限制,即大多数蛋白质仅在模式生物中得到功能验证,这限制了我们对功能如何随基因序列多样性而变化的理解。因此,在没有模式代表的进化枝中进行推断的准确性值得怀疑。无监督学习可能有助于通过从没有外部标签的大型数据集中识别高度复杂的模式和结构来改善这种偏差。在这里,我们展示了DeepSeqProt,这是一个用于探索大型蛋白质序列数据集的无监督深度学习程序。DeepSeqProt是一种聚类工具,能够在学习功能空间的局部和全局结构的同时区分广泛的蛋白质类别。DeepSeqProt能够从未比对、未注释的序列中学习显著的生物学特征。与其他聚类方法相比,DeepSeqProt更有可能在蛋白质组中捕获完整的蛋白质家族和具有统计学意义的共享本体。我们希望这个框架将被证明对研究人员有用,并为进一步发展分子生物学中的无监督深度学习提供一个初步步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a97/10231473/26fa8cad3849/evad084f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验