Suppr超能文献

具有未知结构域的保守微生物蛋白质家族的最需要列表。

A most wanted list of conserved microbial protein families with no known domains.

机构信息

Gladstone Institutes, San Francisco, CA, United States of America.

University of California, Berkeley, CA, United States of America.

出版信息

PLoS One. 2018 Oct 17;13(10):e0205749. doi: 10.1371/journal.pone.0205749. eCollection 2018.

Abstract

The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a "most wanted" list of genes to prioritize for further characterization.

摘要

具有未知功能的基因数量和比例正在迅速增加。为了量化这一现象,并为功能特征基因的优先级排序提供标准,我们开发了一种生物信息学管道,该管道可识别具有无注释结构域的稳健定义的蛋白质家族,对这些家族进行系统发育广度排序,并在宏基因组数据中识别它们。我们将这种方法应用于来自 SFams 数据库的 271965 个蛋白质家族,发现了许多没有功能注释的家族,包括超过 118000 个家族缺乏任何已知的蛋白质结构域。从中,我们确定了 6668 个具有至少三个来自至少两个不同类别的生物体序列的保守蛋白质家族。这些功能未知家族(FUnkFams)存在于 Tara Oceans 考察和人类微生物组计划宏基因组中,其分布与采样环境有关。我们的发现强调了序列数据库中功能新颖性的程度,并建立了一种方法来创建一个“最需要”的基因列表,以优先进行进一步的特征描述。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/201a/6192648/78aed0b616e7/pone.0205749.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验