Suppr超能文献

环节动物门系统发育基因组学再探讨:一种使用全基因组表达序列标签数据挖掘的分支系统学方法并研究缺失数据的影响。

Phylogenomics of Annelida revisited: a cladistic approach using genome-wide expressed sequence tag data mining and examining the effects of missing data.

作者信息

Kvist Sebastian, Siddall Mark E

机构信息

Richard Gilder Graduate School, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA.

Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024, USA.

出版信息

Cladistics. 2013 Aug;29(4):435-448. doi: 10.1111/cla.12015. Epub 2013 Feb 22.

Abstract

We present phylogenomic analyses of the most comprehensive molecular character set compiled for Annelida and its constituent taxa, including over 347 000 aligned nucleotide sites for 39 taxa. The nucleotide data set was recovered using a pre-existing amino acid data set of almost 48 000 aligned sites as a backbone for tBLASTn searches against NCBI. In addition, orthology determinations of the loci in the original amino acid data set were scrutinized using an All vs All Reciprocal Best Hit approach, employing BLASTp, and examining for statistical interdependency among the loci. This approach revealed considerable sequence redundancy among the loci in the original data set and a new data set was compiled, with the redundancy removed. The newly compiled nucleotide data set, the original amino acid data set, and the new reduced amino acid data set were subjected to parsimony analyses and two forms of bootstrap resampling. The last-named data set also was analysed using a maximum-likelihood approach. There were two main objectives to these analyses: (i) to examine the general topology, including support, resulting from the analyses of the new data sets and (ii) to assess the consistency of the branching patterns across optimality criteria by comparison with previous probabilistic approaches. The phylogenetic hypotheses resulting from analyses of the three data sets are largely unsupported, reflecting the continued difficulty of finding numerous, reliable, and suitable loci for a group as ancient as Annelida. Resulting parsimonious hypotheses disagree, in some respects, with the previous probabilistic approaches; Sedentaria and, in most cases, Errantia are not supported as monophyletic groups but Pleistoannelida is recovered as a (unsupported) monophyletic group in one of the three parsimony analyses as well as the likelihood analysis. In addition, we performed missing data titration studies to estimate the impact of missing data on overall support and support for specific clades.

摘要

我们展示了对为环节动物门及其组成类群编制的最全面分子特征集的系统发育基因组学分析,其中包括39个类群的超过347000个比对核苷酸位点。核苷酸数据集是利用一个已有近48000个比对位点的氨基酸数据集作为主干,通过tBLASTn搜索NCBI来获取的。此外,使用“所有对所有相互最佳比对”方法,通过BLASTp并检查位点间的统计相关性,对原始氨基酸数据集中基因座的直系同源性判定进行了仔细审查。这种方法揭示了原始数据集中基因座之间存在相当多的序列冗余,并编制了一个去除冗余的新数据集。对新编制的核苷酸数据集、原始氨基酸数据集和新的精简氨基酸数据集进行了简约分析和两种形式的自展重抽样。最后提到的数据集也使用最大似然法进行了分析。这些分析有两个主要目标:(i)检查新数据集分析产生的总体拓扑结构,包括支持情况;(ii)通过与先前的概率方法比较,评估不同最优标准下分支模式的一致性。对这三个数据集分析得出的系统发育假设在很大程度上缺乏支持,这反映了为像环节动物门这样古老的类群找到大量、可靠且合适的基因座仍然存在困难。得出的简约假设在某些方面与先前的概率方法不同;固着亚纲以及在大多数情况下的游走亚纲不被支持为单系类群,但在三个简约分析之一以及似然分析中,多毛纲被恢复为一个(无支持的)单系类群。此外,我们进行了缺失数据滴定研究,以估计缺失数据对总体支持和特定分支支持的影响。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验