Suppr超能文献

基于最大似然法和马尔可夫链蒙特卡罗法的单基因座物种界定的多速率泊松树过程

Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo.

作者信息

Kapli P, Lutteropp S, Zhang J, Kobert K, Pavlidis P, Stamatakis A, Flouri T

机构信息

The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

出版信息

Bioinformatics. 2017 Jun 1;33(11):1630-1638. doi: 10.1093/bioinformatics/btx025.

Abstract

MOTIVATION

In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced "Poisson Tree Processes" (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences.

RESULTS

We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size.

AVAILABILITY AND IMPLEMENTATION

mPTP is implemented in C and is available for download at http://github.com/Pas-Kapli/mptp under the GNU Affero 3 license. A web-service is available at http://mptp.h-its.org .

CONTACT

: paschalia.kapli@h-its.org or alexandros.stamatakis@h-its.org or tomas.flouri@h-its.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

近年来,分子物种界定已成为量化和分类生物多样性的常规方法。条形码方法在大规模调查中尤为重要,因为它们有助于快速发现物种并估计生物多样性。其中,基于距离的方法是最常见的选择,因为它们在处理大型数据集时扩展性良好;然而,它们对相似性阈值参数敏感,并且忽略了进化关系。最近引入的“泊松树过程”(PTP)方法是一种系统发育感知方法,不依赖于此类阈值。然而,PTP的两个弱点在应用于大型数据集时会影响其准确性和实用性;它没有考虑种内差异,并且对于大量序列来说速度较慢。

结果

我们引入了多速率PTP(mPTP),这是一种改进方法,可缓解PTP的理论和技术缺陷。它纳入了因每个物种的进化历史或采样差异而产生的不同水平的种内遗传多样性。实证数据结果表明,mPTP优于PTP和流行的基于距离的方法,因为它始终能在分类学方面产生更准确的界定(即识别出更多分类学物种,推断出更接近分类学的物种数量)。此外,mPTP不需要任何相似性阈值作为输入。与PTP相比时,新颖的动态规划算法实现了至少五个数量级的加速,使其能够在大型(元)条形码数据中界定物种。此外,马尔可夫链蒙特卡罗采样仅需几秒钟就能对推断的界定进行数百万步的全面评估,且与树的大小无关。

可用性和实现

mPTP用C语言实现,可在http://github.com/Pas-Kapli/mptp上根据GNU Affero 3许可下载。可在http://mptp.h-its.org上使用网络服务。

联系方式

paschalia.kapli@h-its.orgalexandros.stamatakis@h-its.orgtomas.flouri@h-its.org

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b24c/5447239/01bb6a689cae/btx025f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验