Suppr超能文献

整合马尔可夫聚类和分子系统发育学以从保守蛋白家族重建蓝藻物种树。

Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families.

作者信息

Swingley Wesley D, Blankenship Robert E, Raymond Jason

机构信息

Institute of Low Temperature Science, Hokkaido University, Sapporo, Japan.

出版信息

Mol Biol Evol. 2008 Apr;25(4):643-54. doi: 10.1093/molbev/msn034. Epub 2008 Feb 22.

Abstract

Attempts to classify living organisms by their physical characteristics are as old as biology itself. The advent of protein and DNA sequencing--most notably the use of 16S ribosomal RNA--defined a new level of classification that now forms our basic understanding of the history of life on earth. High-throughput sequencing currently provides DNA sequences at an unprecedented rate, not only providing a wealth of information but also posing considerable analytical challenges. Here we present comparative genomics-based methods useful for automating evolutionary analysis between any number of species. As a practical example, we applied our method to the well-studied cyanobacterial lineage. The 24 cyanobacterial genomes compared here occupy a wide variety of environmental niches and play major roles in global carbon and nitrogen cycles. By integrating phylogenetic data inferred for upward of 1,000 protein-coding genes common to all or most cyanobacteria, we have reconstructed an evolutionary history of the phylum, establishing a framework for resolving key issues regarding the evolution of their metabolic and phenotypic diversity. Greater resolution on individual branches can be attained by telescoping inward to the larger set of conserved proteins between fewer taxa. The construction of all individual protein phylogenies allows for quantitative tree scoring, providing insight into the evolutionary history of each protein family as well as probing the limits of phylogenetic resolution. The tools incorporated here are fast, computationally tractable, and easily extendable to other phyla and provide a scaleable framework for contrasting and integrating the information present in thousands of protein-coding genes within related genomes.

摘要

通过生物体的物理特征对其进行分类的尝试与生物学本身一样古老。蛋白质和DNA测序技术的出现——最显著的是16S核糖体RNA的应用——定义了一个新的分类水平,这一水平构成了我们目前对地球生命历史的基本理解。高通量测序目前以前所未有的速度提供DNA序列,不仅提供了丰富的信息,也带来了相当大的分析挑战。在这里,我们展示了基于比较基因组学的方法,这些方法有助于自动进行任意数量物种之间的进化分析。作为一个实际例子,我们将我们的方法应用于研究充分的蓝细菌谱系。这里比较的24个蓝细菌基因组占据了各种各样的环境生态位,并在全球碳和氮循环中发挥着重要作用。通过整合从所有或大多数蓝细菌共有的1000多个蛋白质编码基因推断出的系统发育数据,我们重建了该门的进化历史,建立了一个框架来解决有关其代谢和表型多样性进化的关键问题。通过向内聚焦到较少分类群之间更大的保守蛋白质集,可以在各个分支上获得更高的分辨率。构建所有单个蛋白质系统发育树允许进行定量树评分,从而深入了解每个蛋白质家族的进化历史,并探究系统发育分辨率的极限。这里纳入的工具速度快、计算上易于处理,并且很容易扩展到其他门类,为对比和整合相关基因组中数千个蛋白质编码基因中的信息提供了一个可扩展的框架。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验