Suppr超能文献

从序列数据中学习蛋白质组成基序。

Learning protein constitutive motifs from sequence data.

机构信息

Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 & PSL Research, Paris, France.

出版信息

Elife. 2019 Mar 12;8:e39397. doi: 10.7554/eLife.39397.

Abstract

Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (α-helixes and β-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and 'turning up' or 'turning down' the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype-phenotype relationship for protein families.

摘要

对进化相关蛋白质序列进行统计分析可以提供有关其结构、功能和历史的信息。我们表明,专门用于学习复杂高维数据及其统计特征的受限玻尔兹曼机(RBM)可以有效地从序列信息中对蛋白质家族进行建模。我们在这里将 RBM 应用于 20 个蛋白质家族,并为两个短的蛋白质结构域(Kunitz 和 WW)、一个长的伴侣蛋白(Hsp70)和用于基准测试的合成晶格蛋白提供了详细的结果。RBM 推断出的特征具有生物学可解释性:它们与结构(残基-残基三级接触、扩展的二级模体(α-螺旋和β-折叠)和固有无序区域)、功能(活性和配体特异性)或系统发育同一性有关。此外,我们还使用 RBM 通过组合和“调高”或“调低”不同模式来设计具有潜在特性的新蛋白质序列。因此,我们的工作表明,RBM 是通用且实用的工具,可以用于揭示和利用蛋白质家族的基因型-表型关系。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d769/6436896/1c60dfb02ae8/elife-39397-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验