Pos Edwin, Guevara Andino Juan Ernesto, Sabatier Daniel, Molino Jean-François, Pitman Nigel, Mogollón Hugo, Neill David, Cerón Carlos, Rivas Gonzalo, Di Fiore Anthony, Thomas Raquel, Tirado Milton, Young Kenneth R, Wang Ophelia, Sierra Rodrigo, García-Villacorta Roosevelt, Zagt Roderick, Palacios Walter, Aulestia Milton, Ter Steege Hans
Ecology and Biodiversity Group, Utrecht University Utrecht, the Netherlands ; Section Botany, Naturalis Biodiversity Center Leiden, the Netherlands.
Department of Integrative Biology, University of California Berkeley, California, 94720-3140.
Ecol Evol. 2014 Dec;4(24):4626-36. doi: 10.1002/ece3.1246. Epub 2014 Dec 2.
While studying ecological patterns at large scales, ecologists are often unable to identify all collections, forcing them to either omit these unidentified records entirely, without knowing the effect of this, or pursue very costly and time-consuming efforts for identifying them. These "indets" may be of critical importance, but as yet, their impact on the reliability of ecological analyses is poorly known. We investigated the consequence of omitting the unidentified records and provide an explanation for the results. We used three large-scale independent datasets, (Guyana/ Suriname, French Guiana, Ecuador) each consisting of records having been identified to a valid species name (identified morpho-species - IMS) and a number of unidentified records (unidentified morpho-species - UMS). A subset was created for each dataset containing only the IMS, which was compared with the complete dataset containing all morpho-species (AMS: = IMS + UMS) for the following analyses: species diversity (Fisher's alpha), similarity of species composition, Mantel test and ordination (NMDS). In addition, we also simulated an even larger number of unidentified records for all three datasets and analyzed the agreement between similarities again with these simulated datasets. For all analyses, results were extremely similar when using the complete datasets or the truncated subsets. IMS predicted ≥91% of the variation in AMS in all tests/analyses. Even when simulating a larger fraction of UMS, IMS predicted the results for AMS rather well. Using only IMS also out-performed using higher taxon data (genus-level identification) for similarity analyses. Finding a high congruence for all analyses when using IMS rather than AMS suggests that patterns of similarity and composition are very robust. In other words, having a large number of unidentified species in a dataset may not affect our conclusions as much as is often thought.
在大规模研究生态模式时,生态学家常常无法识别所有的样本,这迫使他们要么完全忽略这些未识别的记录,却不知道这样做的影响,要么为识别这些记录付出极其昂贵且耗时的努力。这些“未识别样本”可能至关重要,但目前它们对生态分析可靠性的影响却鲜为人知。我们研究了忽略未识别记录的后果,并对结果作出了解释。我们使用了三个大规模独立数据集(圭亚那/苏里南、法属圭亚那、厄瓜多尔),每个数据集都包含已鉴定到有效物种名称的记录(已鉴定形态物种——IMS)和一些未识别记录(未识别形态物种——UMS)。为每个数据集创建了一个仅包含IMS的子集,将其与包含所有形态物种的完整数据集(AMS:= IMS + UMS)进行比较,以进行以下分析:物种多样性(费希尔阿尔法指数)、物种组成相似性、曼特尔检验和排序(非度量多维尺度分析)。此外,我们还为所有三个数据集模拟了更多数量的未识别记录,并再次分析这些模拟数据集之间相似性的一致性。对于所有分析,使用完整数据集或截短子集时结果极为相似。在所有测试/分析中,IMS预测了AMS中≥91%的变异。即使模拟更多比例的UMS,IMS对AMS结果的预测也相当不错。在相似性分析中,仅使用IMS也比使用更高级分类单元数据(属级鉴定)表现更好。使用IMS而非AMS时所有分析都具有高度一致性,这表明相似性和组成模式非常稳健。换句话说,数据集中存在大量未识别物种可能对我们结论的影响并不像通常认为的那么大。