Suppr超能文献

多洛数据的新距离。

Novel distances for dollo data.

机构信息

School of Mathematics and Physics, CRC for Forestry, School of Plant Science, University of Tasmania, Private Bag 55, Hobart 7001, Australia.

出版信息

Syst Biol. 2013 Jan 1;62(1):62-77. doi: 10.1093/sysbio/sys071. Epub 2012 Aug 22.

Abstract

We investigate distances on binary (presence/absence) data in the context of a Dollo process, where a trait can only arise once on a phylogenetic tree but may be lost many times. We introduce a novel distance, the Additive Dollo Distance (ADD), that applies to data generated under a Dollo model and show that it has some useful theoretical properties including an intriguing link to the LogDet/paralinear distance. Simulations of Dollo data are used to compare a number of binary distances including ADD, LogDet, a restriction-site-based distance, and some simple, but to our knowledge previously unstudied, variations on common binary distances. The simulations suggest that ADD outperforms other distances on Dollo data. Interestingly, we found that the LogDet distance performs poorly in the context of a Dollo process; this may have implications for its use in connection with conditioned genome reconstruction. We apply the ADD to two Diversity Arrays Technology data sets, one that broadly covers Eucalyptus species and one that focuses on the Eucalyptus series Adnataria. We also reanalyze gene family presence/absence data from bacterial genomes obtained from the COG database and compare the results with previous phylogenies estimated using the conditioned genome reconstruction approach. The results for these case studies are largely congruent with previous studies, in some cases giving more phylogenetic resolution.

摘要

我们研究了二分(存在/缺失)数据在多洛过程背景下的距离,其中特征在系统发育树上只能出现一次,但可能会多次丢失。我们引入了一种新的距离,即加性多洛距离(ADD),它适用于多洛模型生成的数据,并表明它具有一些有用的理论性质,包括与对数行列式/平行距离的有趣联系。对多洛数据的模拟比较了多种二进制距离,包括 ADD、对数行列式、基于限制位点的距离以及一些简单但据我们所知以前未研究过的常见二进制距离的变体。模拟表明,ADD 在多洛数据上的表现优于其他距离。有趣的是,我们发现对数行列式距离在多洛过程的背景下表现不佳;这可能对其与条件基因组重建的结合使用有影响。我们将 ADD 应用于两个多样性阵列技术数据集,一个广泛涵盖桉树物种,另一个专注于桉树系列 Adnataria。我们还重新分析了来自 COG 数据库的细菌基因组基因家族存在/缺失数据,并将结果与使用条件基因组重建方法估计的先前系统发育进行了比较。这些案例研究的结果与先前的研究基本一致,在某些情况下提供了更高的系统发育分辨率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验