使用完整基因组的计算机系统发育基因组学：以类人猿进化为例的研究。

In silico phylogenomics using complete genomes: a case study on the evolution of hominoids.

作者信息

Costa Igor Rodrigues, Prosdocimi Francisco, Jennings W Bryan

机构信息

Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 21941-902, Brazil;

Departamento de Vertebrados, Museu Nacional, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, 20940-040, Brazil.

出版信息

Genome Res. 2016 Sep;26(9):1257-67. doi: 10.1101/gr.203950.115. Epub 2016 Jul 19.

DOI:10.1101/gr.203950.115

PMID:27435933

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5052044/

Abstract

The increasing availability of complete genome data is facilitating the acquisition of phylogenomic data sets, but the process of obtaining orthologous sequences from other genomes and assembling multiple sequence alignments remains piecemeal and arduous. We designed software that performs these tasks and outputs anonymous loci (AL) or anchored enrichment/ultraconserved element loci (AE/UCE) data sets in ready-to-analyze formats. We demonstrate our program by applying it to the hominoids. Starting with human, chimpanzee, gorilla, and orangutan genomes, our software generated an exhaustive data set of 292 ALs (∼1 kb each) in ∼3 h. Not only did analyses of our AL data set validate the program by yielding a portrait of hominoid evolution in agreement with previous studies, but the accuracy and precision of our estimated ancestral effective population sizes and speciation times represent improvements. We also used our program with a published set of 512 vertebrate-wide AE "probe" sequences to generate data sets consisting of 171 and 242 independent loci (∼1 kb each) in 11 and 13 min, respectively. The former data set consisted of flanking sequences 500 bp from adjacent AEs, while the latter contained sequences bordering AEs. Although our AE data sets produced the expected hominoid species tree, coalescent-based estimates of ancestral population sizes and speciation times based on these data were considerably lower than estimates from our AL data set and previous studies. Accordingly, we suggest that loci subjected to direct or indirect selection may not be appropriate for coalescent-based methods. Complete in silico approaches, combined with the burgeoning genome databases, will accelerate the pace of phylogenomics.

摘要

全基因组数据日益容易获取，这推动了系统发育基因组数据集的获取，但从其他基因组中获取直系同源序列并组装多序列比对的过程仍然是零碎且艰巨的。我们设计了一款软件，该软件能执行这些任务，并以易于分析的格式输出匿名基因座（AL）或锚定富集/超保守元件基因座（AE/UCE）数据集。我们通过将该程序应用于类人猿来展示我们的程序。从人类、黑猩猩、大猩猩和猩猩的基因组开始，我们的软件在约3小时内生成了一个包含292个AL（每个约1 kb）的详尽数据集。对我们的AL数据集的分析不仅通过得出与先前研究一致的类人猿进化图景验证了该程序，而且我们估计的祖先有效种群大小和物种形成时间的准确性和精确性也有所提高。我们还将我们的程序与一组已发表的512个全脊椎动物AE“探针”序列一起使用，分别在11分钟和13分钟内生成了由171个和242个独立基因座（每个约1 kb）组成的数据集。前一个数据集由来自相邻AE的500 bp侧翼序列组成，而后一个数据集包含与AE相邻的序列。尽管我们的AE数据集产生了预期的类人猿物种树，但基于这些数据的基于溯祖法的祖先种群大小和物种形成时间估计值明显低于我们的AL数据集和先前研究的估计值。因此，我们认为受到直接或间接选择的基因座可能不适用于基于溯祖法的方法。完整的计算机方法，结合迅速发展的基因组数据库，将加快系统发育基因组学的发展步伐。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/642d/5052044/18077452f575/1257f01.jpg

相似文献

In silico phylogenomics using complete genomes: a case study on the evolution of hominoids.使用完整基因组的计算机系统发育基因组学：以类人猿进化为例的研究。

Genome Res. 2016 Sep;26(9):1257-67. doi: 10.1101/gr.203950.115. Epub 2016 Jul 19.

Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model.从合并隐马尔可夫模型推断人类、黑猩猩和大猩猩的基因组关系及物种形成时间。

PLoS Genet. 2007 Feb 23;3(2):e7. doi: 10.1371/journal.pgen.0030007. Epub 2006 Nov 30.

Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors.在纳入突变率变异和测序错误的贝叶斯合并模型下对类人猿祖先种群大小的估计。

Mol Biol Evol. 2008 Sep;25(9):1979-94. doi: 10.1093/molbev/msn148. Epub 2008 Jul 4.

Reconstructing the demographic history of the human lineage using whole-genome sequences from human and three great apes.利用人类和三种大猿的全基因组序列重建人类世系的人口历史。

Genome Biol Evol. 2012;4(11):1133-45. doi: 10.1093/gbe/evs075.

Molecular phylogeny of the hominoids: inferences from multiple independent DNA sequence data sets.类人猿的分子系统发育：来自多个独立DNA序列数据集的推断

Mol Biol Evol. 1997 Mar;14(3):248-65. doi: 10.1093/oxfordjournals.molbev.a025761.

Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci.利用多个基因座的数据对类人猿祖先种群大小进行似然估计和贝叶斯估计。

Genetics. 2002 Dec;162(4):1811-23. doi: 10.1093/genetics/162.4.1811.

Ancestral population genomics: the coalescent hidden Markov model approach.祖先群体基因组学：合并隐马尔可夫模型方法。

Genetics. 2009 Sep;183(1):259-74. doi: 10.1534/genetics.109.103010. Epub 2009 Jul 6.

Hominoid phylogeny estimated by model selection using goodness of fit significance tests.通过使用拟合优度显著性检验的模型选择来估计的类人猿系统发育。

Mol Phylogenet Evol. 1995 Sep;4(3):283-90. doi: 10.1006/mpev.1995.1025.

Ribosomal RNA gene sequences and hominoid phylogeny.核糖体RNA基因序列与人猿超科系统发育

Mol Biol Evol. 1990 May;7(3):203-19. doi: 10.1093/oxfordjournals.molbev.a040600.

Insights into hominid evolution from the gorilla genome sequence.从大猩猩基因组序列中洞察人类进化。

Nature. 2012 Mar 7;483(7388):169-75. doi: 10.1038/nature10842.

引用本文的文献

: a python pipeline for generating recombination-filtered multi-locus datasets.用于生成重组过滤多位点数据集的Python管道。

Bioinform Adv. 2023 Nov 29;3(1):vbad174. doi: 10.1093/bioadv/vbad174. eCollection 2023.

Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles.非鸟类爬行动物的基因组进化与系统发育基因组学的未来

Animals (Basel). 2023 Jan 29;13(3):471. doi: 10.3390/ani13030471.

Champagne: Automated Whole-Genome Phylogenomic Character Matrix Method Using Large Genomic Indels for Homoplasy-Free Inference.香槟：使用大型基因组插入缺失进行无平行进化推断的自动化全基因组系统发育基因组特征矩阵方法。

Genome Biol Evol. 2022 Mar 2;14(3). doi: 10.1093/gbe/evac013.

A bioinformatic platform to integrate target capture and whole genome sequences of various read depths for phylogenomics.一个生物信息学平台，用于整合不同读深的目标捕获和全基因组序列进行系统发生基因组学研究。

Mol Ecol. 2021 Dec;30(23):6021-6035. doi: 10.1111/mec.16240. Epub 2021 Oct 31.

PhyloWGA: chromosome-aware phylogenetic interrogation of whole genome alignments.PhyloWGA：全基因组比对中染色体感知的系统发育分析。

Bioinformatics. 2021 Jul 27;37(13):1923-1925. doi: 10.1093/bioinformatics/btaa884.

Whole-Genome Analyses Resolve the Phylogeny of Flightless Birds (Palaeognathae) in the Presence of an Empirical Anomaly Zone.全基因组分析解决了实证异常区存在时的不会飞鸟类（古颚总目）的系统发育关系。

Syst Biol. 2019 Nov 1;68(6):937-955. doi: 10.1093/sysbio/syz019.

Conserved Nonexonic Elements: A Novel Class of Marker for Phylogenomics.保守非外显子元件：系统发育基因组学的一类新型标记

Syst Biol. 2017 Nov 1;66(6):1028-1044. doi: 10.1093/sysbio/syx058.

Homology-Aware Phylogenomics at Gigabase Scales.千兆碱基规模的同源性感知系统发育基因组学

Syst Biol. 2017 Jul 1;66(4):590-603. doi: 10.1093/sysbio/syw104.

本文引用的文献

TESTING THE CONSTANT-RATE NEUTRAL ALLELE MODEL WITH PROTEIN SEQUENCE DATA.用蛋白质序列数据检验恒速中性等位基因模型

Evolution. 1983 Jan;37(1):203-217. doi: 10.1111/j.1558-5646.1983.tb05528.x.

A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing.利用靶向下一代 DNA 测序技术对鸟类（Aves）进行全面的系统发育分析。

Nature. 2015 Oct 22;526(7574):569-73. doi: 10.1038/nature15697. Epub 2015 Oct 7.

The birds of Genome10K.基因组 10K 计划中的鸟类。

Gigascience. 2014 Dec 11;3(1):32. doi: 10.1186/2047-217X-3-32. eCollection 2014.

Phylogenomics of phrynosomatid lizards: conflicting signals from sequence capture versus restriction site associated DNA sequencing.角蜥科蜥蜴的系统发育基因组学：序列捕获与限制性位点相关DNA测序产生的相互矛盾的信号

Genome Biol Evol. 2015 Feb 7;7(3):706-19. doi: 10.1093/gbe/evv026.

Whole-genome analyses resolve early branches in the tree of life of modern birds.全基因组分析解决了现代鸟类生命之树早期分支的问题。

Science. 2014 Dec 12;346(6215):1320-31. doi: 10.1126/science.1253451.

The effective population sizes of the anthropoid ancestors of the human-chimpanzee lineage provide insights on the historical biogeography of the great apes.人猿祖先的有效种群大小为大猿的历史生物地理学提供了新见解。

Mol Biol Evol. 2014 Jan;31(1):37-47. doi: 10.1093/molbev/mst191. Epub 2013 Oct 11.

Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales.目标捕获和超保守元件的大规模平行测序，用于在较浅的进化时间尺度上进行比较研究。

Syst Biol. 2014 Jan 1;63(1):83-95. doi: 10.1093/sysbio/syt061. Epub 2013 Sep 10.

On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE.论电视机的不朽：根据无进化福音的 ENCODE，人类基因组中的“功能”。

Genome Biol Evol. 2013;5(3):578-90. doi: 10.1093/gbe/evt028.

Defining evolutionary boundaries across parapatric ecomorphs of Black Salamanders (Aneides flavipunctatus) with conservation implications.定义具有保护意义的黑螈（Aneides flavipunctatus）邻域生态型的进化边界。

Mol Ecol. 2012 Dec;21(23):5745-61. doi: 10.1111/mec.12068. Epub 2012 Oct 25.

Parallel tagged amplicon sequencing reveals major lineages and phylogenetic structure in the North American tiger salamander (Ambystoma tigrinum) species complex.平行标记扩增子测序揭示了北美的虎螈（Ambystoma tigrinum）物种复合体中的主要谱系和系统发育结构。

Mol Ecol. 2013 Jan;22(1):111-29. doi: 10.1111/mec.12049. Epub 2012 Oct 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用完整基因组的计算机系统发育基因组学：以类人猿进化为例的研究。

In silico phylogenomics using complete genomes: a case study on the evolution of hominoids.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献