Emmett Interdisciplinary Program in Environment & Resources (E-IPER), Stanford University, Stanford, CA, United States of America.
Earth Systems Program, Stanford University, Stanford, CA, United States of America.
PeerJ. 2023 Mar 24;11:e14993. doi: 10.7717/peerj.14993. eCollection 2023.
The emerging field of environmental DNA (eDNA) research lacks universal guidelines for ensuring data produced are FAIR-findable, accessible, interoperable, and reusable-despite growing awareness of the importance of such practices. In order to better understand these data usability challenges, we systematically reviewed 60 peer reviewed articles conducting a specific subset of eDNA research: metabarcoding studies in marine environments. For each article, we characterized approximately 90 features across several categories: general article attributes and topics, methodological choices, types of metadata included, and availability and storage of sequence data. Analyzing these characteristics, we identified several barriers to data accessibility, including a lack of common context and vocabulary across the articles, missing metadata, supplementary information limitations, and a concentration of both sample collection and analysis in the United States. While some of these barriers require significant effort to address, we also found many instances where small choices made by authors and journals could have an outsized influence on the discoverability and reusability of data. Promisingly, articles also showed consistency and creativity in data storage choices as well as a strong trend toward open access publishing. Our analysis underscores the need to think critically about data accessibility and usability as marine eDNA metabarcoding studies, and eDNA projects more broadly, continue to proliferate.
环境 DNA(eDNA)研究领域新兴,但缺乏确保所生成数据具有 FAIR 可发现性、可访问性、互操作性和可重用性的通用准则——尽管人们越来越意识到此类实践的重要性。为了更好地理解这些数据可用性挑战,我们系统地回顾了 60 篇同行评审文章,这些文章进行了特定子集的 eDNA 研究:海洋环境中的代谢组学研究。对于每篇文章,我们在几个类别中描述了大约 90 个特征:一般文章属性和主题、方法选择、包含的元数据类型以及序列数据的可用性和存储。通过分析这些特征,我们确定了数据可访问性的几个障碍,包括文章之间缺乏通用的上下文和词汇、缺少元数据、补充信息有限以及样本采集和分析主要集中在美国。虽然这些障碍中的一些需要付出巨大的努力来解决,但我们也发现许多情况下,作者和期刊的一些小选择会对数据的可发现性和可重用性产生重大影响。有希望的是,文章在数据存储选择方面也表现出了一致性和创造性,并且朝着开放获取出版的趋势明显。我们的分析强调了需要批判性地思考数据可访问性和可用性,因为海洋 eDNA 代谢组学研究以及更广泛的 eDNA 项目继续激增。