Fietz Katharina, Graves Jeff A, Olsen Morten Tange
Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen, Denmark.
PLoS One. 2013 Aug 16;8(8):e72853. doi: 10.1371/journal.pone.0072853. eCollection 2013.
Genetic data can provide a powerful tool for those interested in the biology, management and conservation of wildlife, but also lead to erroneous conclusions if appropriate controls are not taken at all steps of the analytical process. This particularly applies to data deposited in public repositories such as GenBank, whose utility relies heavily on the assumption of high data quality. Here we report on an in-depth reassessment and comparison of GenBank and chromatogram mtDNA sequence data generated in a previous study of Baltic grey seals. By re-editing the original chromatogram data we found that approximately 40% of the grey seal mtDNA haplotype sequences posted in GenBank contained errors. The re-analysis of the edited chromatogram data yielded overall similar results and conclusions as the original study. However, a significantly different outcome was observed when using the uncorrected dataset based on the GenBank haplotypes. We therefore suggest disregarding the existing GenBank data and instead using the correct haplotypes reported here. Our study serves as an illustrative example reiterating the importance of quality control through every step of a research project, from data generation to interpretation and submission to an online repository. Errors conducted in any step may lead to biased results and conclusions, and could impact management decisions.
遗传数据可以为那些对野生动物生物学、管理和保护感兴趣的人提供一个强大的工具,但如果在分析过程的所有步骤中不采取适当的控制措施,也可能导致错误的结论。这尤其适用于存放在公共数据库(如GenBank)中的数据,其效用在很大程度上依赖于高数据质量的假设。在这里,我们报告了对GenBank和在先前波罗的海灰海豹研究中生成的色谱图线粒体DNA序列数据的深入重新评估和比较。通过重新编辑原始色谱图数据,我们发现GenBank中发布的约40%的灰海豹线粒体DNA单倍型序列包含错误。对编辑后的色谱图数据的重新分析产生了与原始研究总体相似的结果和结论。然而,当使用基于GenBank单倍型的未校正数据集时,观察到了显著不同的结果。因此,我们建议忽略现有的GenBank数据,而是使用这里报告的正确单倍型。我们的研究作为一个说明性的例子,重申了在研究项目的每一步,从数据生成到解释以及提交到在线数据库,进行质量控制的重要性。在任何步骤中出现的错误都可能导致有偏差的结果和结论,并可能影响管理决策。