基于最大似然法和马尔可夫链蒙特卡罗法的单基因座物种界定的多速率泊松树过程

Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo.

作者信息

Kapli P, Lutteropp S, Zhang J, Kobert K, Pavlidis P, Stamatakis A, Flouri T

机构信息

The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

出版信息

Bioinformatics. 2017 Jun 1;33(11):1630-1638. doi: 10.1093/bioinformatics/btx025.

DOI:10.1093/bioinformatics/btx025

PMID:28108445

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5447239/

Abstract

MOTIVATION

In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced "Poisson Tree Processes" (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences.

RESULTS

We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size.

AVAILABILITY AND IMPLEMENTATION

mPTP is implemented in C and is available for download at http://github.com/Pas-Kapli/mptp under the GNU Affero 3 license. A web-service is available at http://mptp.h-its.org .

CONTACT

: paschalia.kapli@h-its.org or alexandros.stamatakis@h-its.org or tomas.flouri@h-its.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

近年来，分子物种界定已成为量化和分类生物多样性的常规方法。条形码方法在大规模调查中尤为重要，因为它们有助于快速发现物种并估计生物多样性。其中，基于距离的方法是最常见的选择，因为它们在处理大型数据集时扩展性良好；然而，它们对相似性阈值参数敏感，并且忽略了进化关系。最近引入的“泊松树过程”（PTP）方法是一种系统发育感知方法，不依赖于此类阈值。然而，PTP的两个弱点在应用于大型数据集时会影响其准确性和实用性；它没有考虑种内差异，并且对于大量序列来说速度较慢。

结果

我们引入了多速率PTP（mPTP），这是一种改进方法，可缓解PTP的理论和技术缺陷。它纳入了因每个物种的进化历史或采样差异而产生的不同水平的种内遗传多样性。实证数据结果表明，mPTP优于PTP和流行的基于距离的方法，因为它始终能在分类学方面产生更准确的界定（即识别出更多分类学物种，推断出更接近分类学的物种数量）。此外，mPTP不需要任何相似性阈值作为输入。与PTP相比时，新颖的动态规划算法实现了至少五个数量级的加速，使其能够在大型（元）条形码数据中界定物种。此外，马尔可夫链蒙特卡罗采样仅需几秒钟就能对推断的界定进行数百万步的全面评估，且与树的大小无关。

可用性和实现

mPTP用C语言实现，可在http://github.com/Pas-Kapli/mptp上根据GNU Affero 3许可下载。可在http://mptp.h-its.org上使用网络服务。

联系方式

paschalia.kapli@h-its.org或alexandros.stamatakis@h-its.org或tomas.flouri@h-its.org。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b24c/5447239/01bb6a689cae/btx025f1.jpg

相似文献

Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo.基于最大似然法和马尔可夫链蒙特卡罗法的单基因座物种界定的多速率泊松树过程

Bioinformatics. 2017 Jun 1;33(11):1630-1638. doi: 10.1093/bioinformatics/btx025.

A general species delimitation method with applications to phylogenetic placements.一种通用的物种界定方法及其在系统发育定位中的应用。

Bioinformatics. 2013 Nov 15;29(22):2869-76. doi: 10.1093/bioinformatics/btt499. Epub 2013 Aug 29.

Delimiting Species with Single-Locus DNA Sequences.用单基因座 DNA 序列划分物种。

Methods Mol Biol. 2024;2744:53-76. doi: 10.1007/978-1-0716-3581-0_3.

DNA barcoding and species delimitation of butterflies (Lepidoptera) from Nigeria.尼日利亚蝴蝶（鳞翅目）的DNA条形码与物种界定

Mol Biol Rep. 2020 Dec;47(12):9441-9457. doi: 10.1007/s11033-020-05984-5. Epub 2020 Nov 16.

Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent.多物种溯祖模型下物种界定和系统发育估计的算法改进

J Math Biol. 2017 Jan;74(1-2):447-467. doi: 10.1007/s00285-016-1034-0. Epub 2016 Jun 10.

Comparing the Efficiency of Single-Locus Species Delimitation Methods within Trochoidea (Gastropoda: Vetigastropoda).比较 Trochoidea （腹足纲：前鳃亚纲）中单基因物种界定方法的效率。

Genes (Basel). 2022 Dec 2;13(12):2273. doi: 10.3390/genes13122273.

Characterizing coral reef biodiversity: genetic species delimitation in brachyuran crabs of Palmyra Atoll, Central Pacific.描述珊瑚礁生物多样性：中太平洋帕尔米拉环礁短尾蟹的遗传物种划分。

Mitochondrial DNA A DNA Mapp Seq Anal. 2020 Jul;31(5):178-189. doi: 10.1080/24701394.2020.1769087. Epub 2020 Jun 5.

Two DNA barcodes and morphology for multi-method species delimitation in Bonnetina tarantulas (Araneae: Theraphosidae).两种DNA条形码和形态学用于缨毛蛛属狼蛛（蜘蛛目：捕鸟蛛科）的多方法物种界定

Mol Phylogenet Evol. 2016 Aug;101:176-193. doi: 10.1016/j.ympev.2016.05.003. Epub 2016 May 3.

Algorithmic single-locus species delimitation: effects of sampling effort, variation and nonmonophyly in four methods and 1870 species of beetles.算法单基因座物种界定：四种方法和 1870 种甲虫中采样力度、变异和非单系性的影响。

Mol Ecol Resour. 2017 May;17(3):393-404. doi: 10.1111/1755-0998.12557. Epub 2016 Jun 30.

Towards Large-Scale Integrative Taxonomy (LIT): Resolving the Data Conundrum for Dark Taxa.迈向大规模综合分类学（LIT）：解决暗分类群的数据难题。

Syst Biol. 2022 Oct 12;71(6):1404-1422. doi: 10.1093/sysbio/syac033.

引用本文的文献

Discovery of potentially novel species of the Onchocercidae (Nematoda: Filarioidea) in Burmese fighting chickens (): Genetic insights into avian filariasis and co-infection with .在缅甸斗鸡中发现盘尾丝虫科（线虫纲：丝虫总科）潜在新物种：对禽丝虫病及与……共感染的遗传学见解

Curr Res Parasitol Vector Borne Dis. 2025 Aug 5;8:100303. doi: 10.1016/j.crpvbd.2025.100303. eCollection 2025.

(Berberidaceae), a new riparian shrub from northern Sichuan, China.小檗科（Berberidaceae），一种来自中国四川北部的新河岸灌木。

PhytoKeys. 2025 Aug 15;261:165-174. doi: 10.3897/phytokeys.261.158475. eCollection 2025.

Integrated Taxonomy Discovers Four New Species of Speiser, 1928 (Diptera: Asilidae) from China.综合分类学发现来自中国的4种新的斯派瑟蝇（1928年）（双翅目：食虫虻科）。

Insects. 2025 Jul 15;16(7):722. doi: 10.3390/insects16070722.

The First Report of Rhino DNA in Thailand: A Possible Extinct Indian Javan Subspecies, .泰国犀牛DNA的首次报告：一种可能已灭绝的印度爪哇亚种

Animals (Basel). 2025 Jun 6;15(12):1678. doi: 10.3390/ani15121678.

DNA barcoding of Culicoides biting midges (Diptera: Ceratopogonidae) and detection of Leishmania and other trypanosomatids in southern Thailand.泰国南部库蠓（双翅目：蠓科）的DNA条形码分析以及利什曼原虫和其他锥虫的检测

Parasit Vectors. 2025 May 29;18(1):194. doi: 10.1186/s13071-025-06812-0.

Cytauxzoon paradoxurus n. sp., a novel Cytauxzoon species identified in common palm civets in Singapore.奇异嗜吞噬细胞无形体，一种在新加坡普通棕榈狸猫中发现的新型嗜吞噬细胞无形体物种。

Parasit Vectors. 2025 May 15;18(1):175. doi: 10.1186/s13071-025-06820-0.

Species Delimitation and Cryptic Diversity in Thienemann & Bause, 1913 (Diptera: Chironomidae) Based on DNA Barcoding.基于DNA条形码技术对1913年蒂内曼恩和鲍泽（双翅目：摇蚊科）的物种界定与隐存多样性研究

Insects. 2025 Apr 1;16(4):370. doi: 10.3390/insects16040370.

Nuclear Multi-Microsatellite Marker Profiling Provides Clues to Molecular Genetic Diversity in Culture-Based Caspian Beluga Sturgeon (Huso huso) Brood Stocks: Ecological Mirror for Restoration.核多微卫星标记分析为基于养殖的里海大白鳇（Huso huso）亲鱼群体的分子遗传多样性提供线索：恢复的生态镜子。

Vet Med Sci. 2025 May;11(3):e70255. doi: 10.1002/vms3.70255.

An integrative approach clarifies species delimitation and biogeographic history of (Urticaceae).一种综合方法阐明了荨麻科的物种界定和生物地理历史。

Plant Divers. 2024 Nov 26;47(2):229-243. doi: 10.1016/j.pld.2024.11.004. eCollection 2025 Mar.

Cryptic diversity, phenotypic congruence, and evolutionary history of the Leptobotia citrauratea complex (Pisces: Botiidae) within subtropical eastern China.中国东部亚热带地区细身拟鲿复合体（鱼类：鳅科）的隐秘多样性、表型一致性及进化历史

BMC Ecol Evol. 2025 Mar 17;25(1):23. doi: 10.1186/s12862-025-02362-2.

本文引用的文献

A Rapid and Scalable Method for Multilocus Species Delimitation Using Bayesian Model Comparison and Rooted Triplets.一种使用贝叶斯模型比较和有根三元组进行多位点物种界定的快速且可扩展的方法。

Syst Biol. 2016 Sep;65(5):759-71. doi: 10.1093/sysbio/syw028. Epub 2016 Apr 7.

Exploring Genetic Divergence in a Species-Rich Insect Genus Using 2790 DNA Barcodes.利用2790个DNA条形码探索一个物种丰富的昆虫属的遗传分化

PLoS One. 2015 Sep 25;10(9):e0138993. doi: 10.1371/journal.pone.0138993. eCollection 2015.

Mitochondrial phylogenomics and genetic relationships of closely related pine moth (Lasiocampidae: Dendrolimus) species in China, using whole mitochondrial genomes.利用全线粒体基因组研究中国近缘松毛虫（枯叶蛾科：松毛虫属）物种的线粒体系统发育基因组学及遗传关系

BMC Genomics. 2015 Jun 4;16(1):428. doi: 10.1186/s12864-015-1566-5.

Effects of phylogenetic reconstruction method on the robustness of species delimitation using single-locus data.系统发育重建方法对使用单基因座数据进行物种界定稳健性的影响。

Methods Ecol Evol. 2014 Oct;5(10):1086-1094. doi: 10.1111/2041-210X.12246. Epub 2014 Oct 29.

DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent.DISSECT：一种用于多物种溯祖模型下物种界定的无分配贝叶斯发现方法。

Bioinformatics. 2015 Apr 1;31(7):991-8. doi: 10.1093/bioinformatics/btu770. Epub 2014 Nov 23.

Unguided species delimitation using DNA sequence data from multiple Loci.使用来自多个基因座的DNA序列数据进行无指导的物种界定。

Mol Biol Evol. 2014 Dec;31(12):3125-35. doi: 10.1093/molbev/msu279. Epub 2014 Oct 1.

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.RAxML 版本 8：用于系统发育分析和大型系统发育后分析的工具。

Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.

A general species delimitation method with applications to phylogenetic placements.一种通用的物种界定方法及其在系统发育定位中的应用。

Bioinformatics. 2013 Nov 15;29(22):2869-76. doi: 10.1093/bioinformatics/btt499. Epub 2013 Aug 29.

A DNA-based registry for all animal species: the barcode index number (BIN) system.基于 DNA 的所有动物物种登记系统：条形码索引编号（BIN）系统。

PLoS One. 2013 Jul 8;8(7):e66213. doi: 10.1371/journal.pone.0066213. Print 2013.

Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets.使用单基因座数据和广义混合 Yule 复合模型方法对物种进行界定：对模拟数据集的修订方法和评估。

Syst Biol. 2013 Sep;62(5):707-24. doi: 10.1093/sysbio/syt033. Epub 2013 May 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于最大似然法和马尔可夫链蒙特卡罗法的单基因座物种界定的多速率泊松树过程

Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献