Mutanen Marko, Kivelä Sami M, Vos Rutger A, Doorenweerd Camiel, Ratnasingham Sujeevan, Hausmann Axel, Huemer Peter, Dincă Vlad, van Nieukerken Erik J, Lopez-Vaamonde Carlos, Vila Roger, Aarvik Leif, Decaëns Thibaud, Efetov Konstantin A, Hebert Paul D N, Johnsen Arild, Karsholt Ole, Pentinsaari Mikko, Rougerie Rodolphe, Segerer Andreas, Tarmann Gerhard, Zahiri Reza, Godfray H Charles J
Department of Genetics and Physiology, University of Oulu, Finland;
Department of Ecology, University of Oulu, Finland.
Syst Biol. 2016 Nov;65(6):1024-1040. doi: 10.1093/sysbio/syw044. Epub 2016 Jun 10.
The proliferation of DNA data is revolutionizing all fields of systematic research. DNA barcode sequences, now available for millions of specimens and several hundred thousand species, are increasingly used in algorithmic species delimitations. This is complicated by occasional incongruences between species and gene genealogies, as indicated by situations where conspecific individuals do not form a monophyletic cluster in a gene tree. In two previous reviews, non-monophyly has been reported as being common in mitochondrial DNA gene trees. We developed a novel web service "Monophylizer" to detect non-monophyly in phylogenetic trees and used it to ascertain the incidence of species non-monophyly in COI (a.k.a. cox1) barcode sequence data from 4977 species and 41,583 specimens of European Lepidoptera, the largest data set of DNA barcodes analyzed from this regard. Particular attention was paid to accurate species identification to ensure data integrity. We investigated the effects of tree-building method, sampling effort, and other methodological issues, all of which can influence estimates of non-monophyly. We found a 12% incidence of non-monophyly, a value significantly lower than that observed in previous studies. Neighbor joining (NJ) and maximum likelihood (ML) methods yielded almost equal numbers of non-monophyletic species, but 24.1% of these cases of non-monophyly were only found by one of these methods. Non-monophyletic species tend to show either low genetic distances to their nearest neighbors or exceptionally high levels of intraspecific variability. Cases of polyphyly in COI trees arising as a result of deep intraspecific divergence are negligible, as the detected cases reflected misidentifications or methodological errors. Taking into consideration variation in sampling effort, we estimate that the true incidence of non-monophyly is ∼23%, but with operational factors still being included. Within the operational factors, we separately assessed the frequency of taxonomic limitations (presence of overlooked cryptic and oversplit species) and identification uncertainties. We observed that operational factors are potentially present in more than half (58.6%) of the detected cases of non-monophyly. Furthermore, we observed that in about 20% of non-monophyletic species and entangled species, the lineages involved are either allopatric or parapatric-conditions where species delimitation is inherently subjective and particularly dependent on the species concept that has been adopted. These observations suggest that species-level non-monophyly in COI gene trees is less common than previously supposed, with many cases reflecting misidentifications, the subjectivity of species delimitation or other operational factors.
DNA数据的激增正在彻底改变系统研究的各个领域。DNA条形码序列目前已可用于数百万个标本和几十万种物种,越来越多地被用于算法物种界定。物种与基因谱系之间偶尔出现的不一致使情况变得复杂,比如同种个体在基因树中未形成单系类群的情况就表明了这一点。在之前的两篇综述中,线粒体DNA基因树中出现非单系性的情况被报道为很常见。我们开发了一种新颖的网络服务“单系性检测器”来检测系统发育树中的非单系性,并利用它来确定来自4977种欧洲鳞翅目昆虫和41583个标本的COI(又名cox1)条形码序列数据中物种非单系性的发生率,这是从这方面分析的最大的DNA条形码数据集。特别关注了准确的物种鉴定以确保数据完整性。我们研究了建树方法、抽样力度和其他方法学问题的影响,所有这些都会影响对非单系性的估计。我们发现非单系性的发生率为12%,这一数值显著低于之前研究中观察到的。邻接法(NJ)和最大似然法(ML)产生的非单系物种数量几乎相等,但这些非单系性案例中有24.1%仅由其中一种方法发现。非单系物种往往与其最近邻的遗传距离较低,或者种内变异性极高。由于种内深度分化导致的COI树中的多系性案例可以忽略不计,因为检测到的案例反映的是错误鉴定或方法学错误。考虑到抽样力度的差异,我们估计非单系性的实际发生率约为23%,但仍包含操作因素。在操作因素中,我们分别评估了分类学限制(存在被忽视的隐存物种和过度划分的物种)和鉴定不确定性的频率。我们观察到,在检测到的非单系性案例中,超过一半(58.6%)可能存在操作因素。此外,我们观察到,在大约20%的非单系物种和纠缠物种中,所涉及的谱系要么是异域分布要么是邻域分布,在这些情况下,物种界定本质上是主观的,尤其取决于所采用的物种概念。这些观察结果表明,COI基因树中物种水平的非单系性比之前认为的要少见,许多案例反映的是错误鉴定、物种界定的主观性或其他操作因素。