“魔法师二号”全球海洋采样考察：拓展蛋白质家族的范畴

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

作者信息

Yooseph Shibu, Sutton Granger, Rusch Douglas B, Halpern Aaron L, Williamson Shannon J, Remington Karin, Eisen Jonathan A, Heidelberg Karla B, Manning Gerard, Li Weizhong, Jaroszewski Lukasz, Cieplak Piotr, Miller Christopher S, Li Huiying, Mashiyama Susan T, Joachimiak Marcin P, van Belle Christopher, Chandonia John-Marc, Soergel David A, Zhai Yufeng, Natarajan Kannan, Lee Shaun, Raphael Benjamin J, Bafna Vineet, Friedman Robert, Brenner Steven E, Godzik Adam, Eisenberg David, Dixon Jack E, Taylor Susan S, Strausberg Robert L, Frazier Marvin, Venter J Craig

机构信息

J. Craig Venter Institute, Rockville, Maryland, United States of America.

出版信息

PLoS Biol. 2007 Mar;5(3):e16. doi: 10.1371/journal.pbio.0050016.

DOI:10.1371/journal.pbio.0050016

PMID:17355171

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1821046/

Abstract

Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.

摘要

基于对微生物群体进行鸟枪法测序的宏基因组学项目，使人们对蛋白质家族有了深入了解。我们使用序列相似性聚类，通过一个综合数据集来探索蛋白质，该数据集由来自现有数据库的序列以及从770万个全球海洋采样（GOS）序列组装预测出的612万个蛋白质组成。GOS数据集涵盖了几乎所有已知的原核生物蛋白质家族。总共鉴定出3995个仅由GOS序列组成的中型和大型聚类，其中1700个与已知家族没有可检测到的同源性。仅包含GOS序列的聚类中，病毒起源序列所占比例高于预期，这反映出到目前为止对病毒多样性的采样不足。GOS数据集和当前蛋白质数据库中的蛋白质结构域分布存在明显偏差。一些先前被归类为特定界别的蛋白质结构域在其他界别中也有GOS示例。文献中迄今与已知蛋白质缺乏相似性的约6000个序列（孤儿序列）在GOS数据中有匹配项。GOS数据集还用于改进远程同源性检测。总体而言，除了使当前蛋白质数量几乎增加一倍外，预测的GOS蛋白质还为已知蛋白质家族增添了大量多样性，并揭示了它们的进化过程。使用包括磷酸酶、蛋白酶、紫外线照射DNA损伤修复酶、谷氨酰胺合成酶和核酮糖-1,5-二磷酸羧化酶/加氧酶等几个蛋白质家族对这些观察结果进行了说明。作为结构基因组学工作的一部分，GOS数据增加的多样性对选择实验结构表征的目标具有重要意义。我们的分析表明，新家族正以与新序列增加呈线性或几乎呈线性的速度被发现，这意味着我们距离发现自然界中所有蛋白质家族仍相差甚远。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4597/1821046/3cfaf15760bc/oceaniclogo.jpg

相似文献

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

PLoS Biol. 2007 Mar;5(3):e16. doi: 10.1371/journal.pbio.0050016.

Probing metagenomics by rapid cluster analysis of very large datasets.

PLoS One. 2008;3(10):e3375. doi: 10.1371/journal.pone.0003375. Epub 2008 Oct 10.

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.

PLoS Biol. 2007 Mar;5(3):e77. doi: 10.1371/journal.pbio.0050077.

The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples.

PLoS One. 2008 Jan 23;3(1):e1456. doi: 10.1371/journal.pone.0001456.

In silico approach to designing rational metagenomic libraries for functional studies.

BMC Bioinformatics. 2017 May 22;18(1):267. doi: 10.1186/s12859-017-1668-y.

Distribution of microbial terpenoid lipid cyclases in the global ocean metagenome.

ISME J. 2009 Mar;3(3):352-63. doi: 10.1038/ismej.2008.116. Epub 2008 Nov 27.

Gene network visualization and quantitative synteny analysis of more than 300 marine T4-like phage scaffolds from the GOS metagenome.

Mol Biol Evol. 2010 Aug;27(8):1935-44. doi: 10.1093/molbev/msq076. Epub 2010 Mar 15.

Protein family clustering for structural genomics.

J Mol Biol. 2005 Oct 28;353(3):744-59. doi: 10.1016/j.jmb.2005.08.058. Epub 2005 Sep 9.

The capsid of the T4 phage superfamily: the evolution, diversity, and structure of some of the most prevalent proteins in the biosphere.

Mol Biol Evol. 2008 Jul;25(7):1321-32. doi: 10.1093/molbev/msn080. Epub 2008 Apr 7.

Structural and functional diversity of the microbial kinome.

PLoS Biol. 2007 Mar;5(3):e17. doi: 10.1371/journal.pbio.0050017.

引用本文的文献

Integrative AI-Based Approaches to Connect the Multiome to Use Microbiome-Metabolome Interactive Outcome as Precision Medicine.

Methods Mol Biol. 2025;2952:15-37. doi: 10.1007/978-1-0716-4690-8_2.

Naturally ornate RNA-only complexes revealed by cryo-EM.

Nature. 2025 May 6. doi: 10.1038/s41586-025-09073-0.

From nets to networks: tools for deciphering phytoplankton metabolic interactions within communities and their global significance.

Philos Trans R Soc Lond B Biol Sci. 2024 Sep 9;379(1909):20230172. doi: 10.1098/rstb.2023.0172. Epub 2024 Jul 22.

Diversity and potential host-interactions of viruses inhabiting deep-sea seamount sediments.

Nat Commun. 2024 Apr 15;15(1):3228. doi: 10.1038/s41467-024-47600-1.

Marine picoplankton metagenomes and MAGs from eleven vertical profiles obtained by the Malaspina Expedition.

Sci Data. 2024 Feb 1;11(1):154. doi: 10.1038/s41597-024-02974-1.

Seasonal patterns in microbial carbon and iron transporter expression in the Southern Ocean.

Microbiome. 2023 Aug 19;11(1):187. doi: 10.1186/s40168-023-01600-3.

Identification of microbial metabolic functional guilds from large genomic datasets.

Front Microbiol. 2023 Jun 30;14:1197329. doi: 10.3389/fmicb.2023.1197329. eCollection 2023.

The Landscape of Global Ocean Microbiome: From Bacterioplankton to Biofilms.

Int J Mol Sci. 2023 Mar 30;24(7):6491. doi: 10.3390/ijms24076491.

In silico evaluation and selection of the best 16S rRNA gene primers for use in next-generation sequencing to detect oral bacteria and archaea.

Microbiome. 2023 Mar 23;11(1):58. doi: 10.1186/s40168-023-01481-6.

Thermophilic Carboxylesterases from Hydrothermal Vents of the Volcanic Island of Ischia Active on Synthetic and Biobased Polymers and Mycotoxins.

Appl Environ Microbiol. 2023 Feb 28;89(2):e0170422. doi: 10.1128/aem.01704-22. Epub 2023 Jan 31.

本文引用的文献

The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.

PLoS Biol. 2007 Mar;5(3):e77. doi: 10.1371/journal.pbio.0050077.

Structural and functional diversity of the microbial kinome.

PLoS Biol. 2007 Mar;5(3):e17. doi: 10.1371/journal.pbio.0050017.

Update on the pfam5000 strategy for selection of structural genomics targets.

Conf Proc IEEE Eng Med Biol Soc. 2005;2006:751-5. doi: 10.1109/IEMBS.2005.1616523.

Genomic islands and the ecology and evolution of Prochlorococcus.

Science. 2006 Mar 24;311(5768):1768-70. doi: 10.1126/science.1122050.

Community genomics among stratified microbial assemblages in the ocean's interior.

Science. 2006 Jan 27;311(5760):496-503. doi: 10.1126/science.1120250.

The impact of structural genomics: expectations and outcomes.

Science. 2006 Jan 20;311(5759):347-51. doi: 10.1126/science.1121018.

MEROPS: the peptidase database.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D270-2. doi: 10.1093/nar/gkj089.

Database resources of the National Center for Biotechnology Information.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D173-80. doi: 10.1093/nar/gkj158.

Metagenomics: DNA sequencing of environmental samples.

Nat Rev Genet. 2005 Nov;6(11):805-14. doi: 10.1038/nrg1709.

Evidence of a large novel gene pool associated with prokaryotic genomic islands.

PLoS Genet. 2005 Nov;1(5):e62. doi: 10.1371/journal.pgen.0010062. Epub 2005 Nov 18.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

“魔法师二号”全球海洋采样考察：拓展蛋白质家族的范畴

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

作者信息

机构信息

J. Craig Venter Institute, Rockville, Maryland, United States of America.

出版信息

PLoS Biol. 2007 Mar;5(3):e16. doi: 10.1371/journal.pbio.0050016.

DOI:10.1371/journal.pbio.0050016

PMID:17355171

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1821046/

Abstract

摘要

“魔法师二号”全球海洋采样考察：拓展蛋白质家族的范畴

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

“魔法师二号”全球海洋采样考察：拓展蛋白质家族的范畴

The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献