Suppr超能文献

档案馆半空:微生物群落测序数据可用性评估。

The archives are half-empty: an assessment of the availability of microbial community sequencing data.

机构信息

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany.

Leipzig University, Institute of Biology, Deutscher Platz 5e, 04103, Leipzig, Germany.

出版信息

Commun Biol. 2020 Aug 28;3(1):474. doi: 10.1038/s42003-020-01204-9.

Abstract

As DNA sequencing has become more popular, the public genetic repositories where sequences are archived have experienced explosive growth. These repositories now hold invaluable collections of sequences, e.g., for microbial ecology, but whether these data are reusable has not been evaluated. We assessed the availability and state of 16S rRNA gene amplicon sequences archived in public genetic repositories (SRA, EBI, and DDJ). We screened 26,927 publications in 17 microbiology journals, identifying 2015 16S rRNA gene sequencing studies. Of these, 7.2% had not made their data public at the time of analysis. Among a subset of 635 studies sequencing the same gene region, 40.3% contained data which was not available or not reusable, and an additional 25.5% contained faults in data formatting or data labeling, creating obstacles for data reuse. Our study reveals gaps in data availability, identifies major contributors to data loss, and offers suggestions for improving data archiving practices.

摘要

随着 DNA 测序变得越来越流行,存档序列的公共基因库经历了爆炸式的增长。这些存储库现在拥有宝贵的序列集合,例如微生物生态学,但这些数据是否可重复使用尚未得到评估。我们评估了公共基因存储库(SRA、EBI 和 DDJ)中存档的 16S rRNA 基因扩增子序列的可用性和状态。我们筛选了 17 种微生物学杂志中的 26927 篇论文,确定了 2015 项 16S rRNA 基因测序研究。其中,7.2%的研究在分析时尚未公开其数据。在测序相同基因区域的 635 项研究的子集中,40.3%的数据不可用或不可重复使用,另外 25.5%的数据在数据格式或数据标记中存在错误,这给数据的重复使用带来了障碍。我们的研究揭示了数据可用性方面的差距,确定了导致数据丢失的主要因素,并为改进数据存档实践提供了建议。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验