Suppr超能文献

基于校正水平基因转移的无比对方法的系统发育重建

Phylogeny Reconstruction with Alignment-Free Method That Corrects for Horizontal Gene Transfer.

作者信息

Bromberg Raquel, Grishin Nick V, Otwinowski Zbyszek

机构信息

Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, United States of America.

Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, United States of America.

出版信息

PLoS Comput Biol. 2016 Jun 23;12(6):e1004985. doi: 10.1371/journal.pcbi.1004985. eCollection 2016 Jun.

Abstract

Advances in sequencing have generated a large number of complete genomes. Traditionally, phylogenetic analysis relies on alignments of orthologs, but defining orthologs and separating them from paralogs is a complex task that may not always be suited to the large datasets of the future. An alternative to traditional, alignment-based approaches are whole-genome, alignment-free methods. These methods are scalable and require minimal manual intervention. We developed SlopeTree, a new alignment-free method that estimates evolutionary distances by measuring the decay of exact substring matches as a function of match length. SlopeTree corrects for horizontal gene transfer, for composition variation and low complexity sequences, and for branch-length nonlinearity caused by multiple mutations at the same site. We tested SlopeTree on 495 bacteria, 73 archaea, and 72 strains of Escherichia coli and Shigella. We compared our trees to the NCBI taxonomy, to trees based on concatenated alignments, and to trees produced by other alignment-free methods. The results were consistent with current knowledge about prokaryotic evolution. We assessed differences in tree topology over different methods and settings and found that the majority of bacteria and archaea have a core set of proteins that evolves by descent. In trees built from complete genomes rather than sets of core genes, we observed some grouping by phenotype rather than phylogeny, for instance with a cluster of sulfur-reducing thermophilic bacteria coming together irrespective of their phyla. The source-code for SlopeTree is available at: http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz.

摘要

测序技术的进步已产生了大量完整的基因组。传统上,系统发育分析依赖于直系同源基因的比对,但定义直系同源基因并将它们与旁系同源基因区分开来是一项复杂的任务,可能并不总是适用于未来的大型数据集。传统的基于比对的方法的一种替代方法是全基因组、无比对方法。这些方法具有可扩展性,且需要最少的人工干预。我们开发了SlopeTree,这是一种新的无比对方法,它通过测量精确子串匹配随匹配长度的衰减来估计进化距离。SlopeTree可校正水平基因转移、组成变化和低复杂性序列,以及由同一位点的多个突变导致的分支长度非线性。我们在495种细菌、73种古细菌以及72株大肠杆菌和志贺氏菌上测试了SlopeTree。我们将我们构建的树与NCBI分类法、基于串联比对构建的树以及其他无比对方法生成的树进行了比较。结果与当前关于原核生物进化的知识一致。我们评估了不同方法和设置下树拓扑结构的差异,发现大多数细菌和古细菌都有一组通过遗传进化的核心蛋白质。在由完整基因组而非核心基因集构建的树中,我们观察到一些按表型而非系统发育的分组,例如一群还原硫的嗜热细菌聚集在一起,而不考虑它们的门。SlopeTree的源代码可在以下网址获取:http://prodata.swmed.edu/download/pub/slopetree_v1/slopetree.tar.gz。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c239/4918981/46f8af166f67/pcbi.1004985.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验