Suppr超能文献

面对系统误差源以解决历史上有争议的关系:以长尾鳕形目鱼类(硬骨鱼纲,副鳍亚纲,长尾鳕目)为例的案例研究。

Confronting Sources of Systematic Error to Resolve Historically Contentious Relationships: A Case Study Using Gadiform Fishes (Teleostei, Paracanthopterygii, Gadiformes).

机构信息

National Systematics Laboratory of the National Oceanic Atmospheric Administration Fisheries Service, 10th St. & Constitution Ave. NW, Washington, DC 20560, USA.

Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, 10th St. & Constitution Ave. NW, Washington, DC 20560, USA.

出版信息

Syst Biol. 2021 Jun 16;70(4):739-755. doi: 10.1093/sysbio/syaa095.

Abstract

Reliable estimation of phylogeny is central to avoid inaccuracy in downstream macroevolutionary inferences. However, limitations exist in the implementation of concatenated and summary coalescent approaches, and Bayesian and full coalescent inference methods may not yet be feasible for computation of phylogeny using complicated models and large data sets. Here, we explored methodological (e.g., optimality criteria, character sampling, model selection) and biological (e.g., heterotachy, branch length heterogeneity) sources of systematic error that can result in biased or incorrect parameter estimates when reconstructing phylogeny by using the gadiform fishes as a model clade. Gadiformes include some of the most economically important fishes in the world (e.g., Cods, Hakes, and Rattails). Despite many attempts, a robust higher-level phylogenetic framework was lacking due to limited character and taxonomic sampling, particularly from several species-poor families that have been recalcitrant to phylogenetic placement. We compiled the first phylogenomic data set, including 14,208 loci ($>$2.8 M bp) from 58 species representing all recognized gadiform families, to infer a time-calibrated phylogeny for the group. Data were generated with a gene-capture approach targeting coding DNA sequences from single-copy protein-coding genes. Species-tree and concatenated maximum-likelihood (ML) analyses resolved all family-level relationships within Gadiformes. While there were a few differences between topologies produced by the DNA and the amino acid data sets, most of the historically unresolved relationships among gadiform lineages were consistently well resolved with high support in our analyses regardless of the methodological and biological approaches used. However, at deeper levels, we observed inconsistency in branch support estimates between bootstrap and gene and site coefficient factors (gCF, sCF). Despite numerous short internodes, all relationships received unequivocal bootstrap support while gCF and sCF had very little support, reflecting hidden conflict across loci. Most of the gene-tree and species-tree discordance in our study is a result of short divergence times, and consequent lack of informative characters at deep levels, rather than incomplete lineage sorting. We use this phylogeny to establish a new higher-level classification of Gadiformes as a way of clarifying the evolutionary diversification of the order. We recognize 17 families in five suborders: Bregmacerotoidei, Gadoidei, Ranicipitoidei, Merluccioidei, and Macrouroidei (including two subclades). A time-calibrated analysis using 15 fossil taxa suggests that Gadiformes evolved $\sim $79.5 Ma in the late Cretaceous, but that most extant lineages diverged after the Cretaceous-Paleogene (K-Pg) mass extinction (66 Ma). Our results reiterate the importance of examining phylogenomic analyses for evidence of systematic error that can emerge as a result of unsuitable modeling of biological factors and/or methodological issues, even when data sets are large and yield high support for phylogenetic relationships. [Branch length heterogeneity; Codfishes; commercial fish species; Cretaceous-Paleogene (K-Pg); heterotachy; systematic error; target enrichment.].

摘要

系统发育的可靠估计是避免下游宏观进化推断不准确的核心。然而,串联和汇总的合并方法的实施存在局限性,贝叶斯和完全合并推断方法可能还不可行,因为使用复杂的模型和大数据集计算系统发育时。在这里,我们探讨了系统发育重建时可能导致参数估计有偏差或不正确的系统误差的方法(例如,最优性标准、特征采样、模型选择)和生物学(例如,异速生长、分支长度异质性)来源,使用 Gadiformes 作为模型进化枝。 Gadiformes 包括世界上一些最重要的经济鱼类(例如 Cod、Hake 和 Rattails)。尽管进行了多次尝试,但由于特征和分类学采样有限,特别是来自一些物种较少的家族,这些家族一直难以进行系统发育定位,因此仍然缺乏稳健的高级系统发育框架。我们编译了第一个基因组数据集,包括来自 58 个物种的 14,208 个基因座(超过 2.8 Mbp),这些物种代表了所有公认的 Gadiformes 科,以推断该组的时间校准系统发育。数据是通过针对单拷贝蛋白质编码基因的编码 DNA 序列的基因捕获方法生成的。物种树和串联最大似然(ML)分析解决了 Gadiformes 内所有科级别的关系。虽然 DNA 和氨基酸数据集产生的拓扑结构之间存在一些差异,但我们的分析中大多数 Gadiform 谱系之间历史上未解决的关系都得到了很好的解决,并且具有很高的支持率,无论使用何种方法和生物学方法。然而,在更深的层次上,我们观察到分支支持估计值在自举和基因和位点系数因子(gCF、sCF)之间存在不一致。尽管有许多短的内节点,但所有关系都得到了明确的自举支持,而 gCF 和 sCF 的支持很少,反映了基因座之间隐藏的冲突。我们研究中的大多数基因树和种系树分歧是由于短的分歧时间造成的,因此在深层水平缺乏信息特征,而不是不完全谱系分选。我们使用这个系统发育来建立 Gadiformes 的新高级分类,以澄清这个目进化多样化的方式。我们在五个亚目中识别出 17 个科:Bregmacerotoidei、Gadoidei、Ranicipitoidei、Merluccioidei 和 Macrouroidei(包括两个亚科)。使用 15 个化石分类单元的时间校准分析表明, Gadiformes 是在白垩纪晚期(79.5 Ma)进化而来的,但大多数现存的谱系是在白垩纪-古近纪(K-Pg)大灭绝(66 Ma)之后分化的。我们的结果再次强调了检查基因组分析中是否存在系统误差的重要性,这些系统误差可能是由于生物因素和/或方法学问题的不合适建模而出现的,即使数据集很大并且对系统发育关系有很高的支持率。[分支长度异质性;Cod 鱼;商业鱼类物种;白垩纪-古近纪(K-Pg);异速生长;系统误差;目标富集。]。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dee3/8561434/dfd158f1fc30/syaa095f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验