Suppr超能文献

去芜存菁:减轻松属(松科)质体基因组数据集噪声的影响。

Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae).

机构信息

Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331-2902, USA.

出版信息

BMC Evol Biol. 2012 Jun 25;12:100. doi: 10.1186/1471-2148-12-100.

Abstract

BACKGROUND

Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating 'noisy' data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution.

RESULTS

We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set.

CONCLUSIONS

These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses.

摘要

背景

近年来,通过下一代测序,用于系统发育分析的序列数据量呈指数级增长。与此同时,纳入具有误导性系统发育信号的“嘈杂”数据的风险也增加了,并且可能不成比例地影响支持较弱的节点的拓扑结构以及具有快速辐射和/或进化率升高的谱系。

结果

我们通过应用两种基本策略(可变位点去除和长枝排除),对 107 种松属和六个松科外类群的完整质体全排列进行系统发育分析,研究了大数据集中系统发育噪声的影响。虽然包含所有数据会产生较高的总体系统发育分辨率,但仍有三个历史上顽固的节点与先前的分析存在冲突。对这些节点的仔细研究表明,它们对数据去除的反应截然不同。虽然两个分支的拓扑分辨率和自举支持在去除高度可变的位点时达到峰值,但当包含所有位点时,第三个分支的分辨率最高。使用长枝排除也观察到类似的趋势,但模式既不强也不清晰。与核基因座和形态数据的先前系统发育分析相比,在 Pinus 质体分析中看到的最受支持的拓扑结构对于从可变位点去除和长枝排除获得支持的两个分支是一致的,但与从完整数据集获得最高支持的分支是冲突的。

结论

这些结果表明,去除系统发育数据集的误导信号不仅可以提高支持较弱节点的分辨率,而且可以作为识别错误但高度支持的拓扑结构的工具。对于松属叶绿体基因组而言,去除可变位点似乎比长枝排除更有效地澄清系统发育假说。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc8c/3475122/96efeb4cc407/1471-2148-12-100-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验