Lee Sylvia S, Bishop Ian W, Spaulding Sarah A, Mitchell Richard M, Yuan Lester L
U.S. Environmental Protection Agency, Office of Research and Development, National Center for Environmental Assessment, 1200 Pennsylvania Ave. NW, Mail Code 8623-P, Washington, D.C. 20460, USA.
Institute of Arctic and Alpine Research, University of Colorado, Campus Box 450, Boulder, CO 80309, USA.
Ecol Indic. 2019 Jul 1;102:166-174. doi: 10.1016/j.ecolind.2019.01.061.
Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency's National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008-2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined "slash groups" did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variation explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., ) species complex, and elevated , , and taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera , , and , indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove analyst signal, this work provides a method to minimize analyst signal and improve detection of diatom association with TP in large datasets involving multiple analysts. Examination of variation in assemblage data explained by analyst and taxonomic harmonization may be necessary steps for improving data quality and the utility of diatoms as indicators of environmental variables.
在美国的大规模生物评估中,如美国环境保护局的国家河流和溪流评估(NRSA),已经收集了硅藻数据。然而,如果不同分析人员对分类单元的识别不一致,掩盖了群落组成与环境变量之间的关系,那么硅藻作为指标的有效性可能会受到影响。为了减少这些不一致性,我们通过将名称更新为当前同义词,并通过统计识别具有高分析人员信号的分类单元(相对于环境变量,由分析人员因素解释的相对丰度变化更大的分类单元),对来自九位分析人员的2008 - 2009年NRSA数据进行了协调。然后,我们用质量保证/质量控制数据筛选了一部分样本,并将主要和次要分析人员识别不匹配的分类单元进行了合并。当这些合并的“斜线组”没有降低分析人员信号时,我们将分类单元提升到属级或在困难的物种复合体中省略分类单元。我们检查了原始数据集和修订后数据集中分析人员解释的变异。此外,我们研究了如何修订数据集以减少分析人员信号,从而减少不一致性,进而揭示由总磷(TP)解释的群落组成变异,总磷是水资源管理者高度关注的环境变量。为了生成分类一致性最高的修订数据集,我们最终创建了124个斜线组,在小型舟形藻(例如)物种复合体中省略了7个分类单元,并将、和分类单元提升到属级。相对于原始数据集,修订后的数据集在排序空间中按分析人员分组的样本之间有更多重叠,分析人员因素解释的变异更少,并且由TP解释的群落组成变异增加了一倍多。将所有分类单元提升到属级并没有完全消除分析人员信号,并且分析人员仍然是、和属的最重要预测因子,这表明这些分类单元是该数据集中一致识别的最大障碍。虽然我们的过程没有完全消除分析人员信号,但这项工作提供了一种方法,可在涉及多个分析人员的大型数据集中最小化分析人员信号,并改善对硅藻与TP关联的检测。检查分析人员解释的群落数据变异和分类协调可能是提高数据质量以及硅藻作为环境变量指标效用的必要步骤。