Bravo Gustavo A, Antonelli Alexandre, Bacon Christine D, Bartoszek Krzysztof, Blom Mozes P K, Huynh Stella, Jones Graham, Knowles L Lacey, Lamichhaney Sangeet, Marcussen Thomas, Morlon Hélène, Nakhleh Luay K, Oxelman Bengt, Pfeil Bernard, Schliep Alexander, Wahlberg Niklas, Werneck Fernanda P, Wiedenhoeft John, Willows-Munro Sandi, Edwards Scott V
Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA.
Gothenburg Global Biodiversity Centre, Göteborg, Sweden.
PeerJ. 2019 Feb 14;7:e6399. doi: 10.7717/peerj.6399. eCollection 2019.
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
构建生命之树(ToL)是现代生物学面临的一项重大挑战,需要在网络基础设施、数据收集、理论等方面取得进展。在此,我们认为,系统发育基因组学有望从高通量测序(HTS)催生的大规模系统发育分析的第一个十年中出现的众多异质基因组信号中受益。这些信号包括系统发育基因组数据集中最常见的信号,如不完全谱系分选,也包括出现频率更高的网状过程,如重组和基因渗入。在这里,我们特别关注系统发育方法如何适应此类群体遗传过程所带来的异质性;我们不讨论忽略此类过程的系统发育方法,如串联或超级矩阵方法或超级树。我们认为,在对感兴趣的分类群进行常规全基因组测序从而使标记选择的后验方法成为可能之前,数据采集方法和系统发育基因组学中使用的标记类型将仍然受到限制。我们讨论了当前支持系统发育基因组学创新的一个模型——多物种合并模型(MSC)的局限性和潜在扩展。使用系统发育树的宏观进化模型,如特征映射,往往忽略了构建系统发育树越来越依赖的异质性,并表明吸收这种异质性是未来的一个重要目标。最后,我们认为,一个整合的网络基础设施,将构建生命之树过程的所有步骤,从野外标本采集到系统发育基因组数据的发布和跟踪,以及一种重视每个步骤贡献者的文化,对于取得进展至关重要。