• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对基因表达综合数据库中组学研究相关公共元数据完整性的系统评估。

The systematic assessment of completeness of public metadata accompanying omics studies in the Gene Expression Omnibus.

作者信息

Huang Yu-Ning, Jaiswal Pooja Vinod, Rajesh Anushka, Yadav Anushka, Yu Dottie, Liu Fangyun, Scheg Grace, Shih Emma, Boldirev Grigore, Nakashidze Irina, Sarkar Aditya, Mehta Jay Himanshu, Wang Ke, Patel Khooshbu Kantibhai, Mirza Mustafa Ali Baig, Hapani Kunali Chetan, Peng Qiushi, Ayyala Ram, Guo Ruiwei, Kapur Shaunak, Ramesh Tejasvene, Ciorbă Dumitru, Munteanu Viorel, Bostan Viorel, Dimian Mihai, Abedalthagafi Malak S, Mangul Serghei

机构信息

Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, California, 90089, USA.

Department of Clinical Pharmacy, Alfred E. Mann School of Pharmacy, University of Southern California, Los Angeles, California, 90089, USA.

出版信息

bioRxiv. 2025 Jul 7:2021.11.22.469640. doi: 10.1101/2021.11.22.469640.

DOI:10.1101/2021.11.22.469640
PMID:40672350
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12265520/
Abstract

Recent advances in high-throughput sequencing technologies have made it possible to collect and share a massive amount of omics data, along with its associated metadata. Enhancing metadata availability is critical to ensure data reusability and reproducibility and to facilitate novel biomedical discoveries through effective data reuse. Yet, incomplete metadata accompanying public omics data may hinder reproducibility and reusability by reducing sample interpretability and limiting secondary analyses. In this study, we performed a comprehensive assessment of metadata completeness shared in both scientific publications and/or public repositories by analyzing over 253 studies encompassing over 164 thousands samples, including both human and non-human mammalian studies. We observed that studies often omit over a quarter of important phenotypes, with an average of only 74.8% of them shared either in the text of publication or the corresponding repository. Notably, public repositories alone contained 62% of the metadata, surpassing the textual content of publications by 3.5%. Only 11.5% of studies completely shared all phenotypes, while 37.9% shared less than 40% of the phenotypes. Studies involving non-human samples were more likely to share metadata than studies involving human samples. We observed similar results on the extended dataset spanning 2.1 million samples across over 61,000 studies from the Gene Expression Omnibus repository. The limited availability of metadata reported in our study emphasizes the necessity for improved metadata sharing practices and standardized reporting. Finally, we discuss the numerous benefits of improving the availability and quality of metadata to the scientific community and beyond, supporting data-driven decision-making and policy development in the field of biomedical research. This work provides a scalable framework for evaluating metadata availability and may help guide future policy and infrastructure development.

摘要

高通量测序技术的最新进展使得收集和共享大量组学数据及其相关元数据成为可能。提高元数据的可用性对于确保数据的可重用性和可重复性以及通过有效的数据重用促进新的生物医学发现至关重要。然而,公共组学数据所附带的不完整元数据可能会降低样本的可解释性并限制二次分析,从而阻碍可重复性和可重用性。在本研究中,我们通过分析涵盖超过16.4万个样本的253项以上研究(包括人类和非人类哺乳动物研究),对科学出版物和/或公共存储库中共享的元数据完整性进行了全面评估。我们观察到,研究常常遗漏超过四分之一的重要表型,平均只有74.8%的表型在出版物文本或相应存储库中共享。值得注意的是,仅公共存储库就包含了62%的元数据,比出版物的文本内容多3.5%。只有11.5%的研究完全共享了所有表型,而37.9%的研究共享的表型不到40%。涉及非人类样本的研究比涉及人类样本的研究更有可能共享元数据。我们在来自基因表达综合数据库的跨越61000多项研究的210万个样本的扩展数据集上观察到了类似的结果。我们研究中报告的元数据可用性有限,这凸显了改进元数据共享实践和标准化报告的必要性。最后,我们讨论了提高元数据的可用性和质量对科学界及其他领域的诸多好处,支持生物医学研究领域的数据驱动决策和政策制定。这项工作提供了一个可扩展的框架来评估元数据可用性,并可能有助于指导未来的政策和基础设施发展。

相似文献

1
The systematic assessment of completeness of public metadata accompanying omics studies in the Gene Expression Omnibus.对基因表达综合数据库中组学研究相关公共元数据完整性的系统评估。
bioRxiv. 2025 Jul 7:2021.11.22.469640. doi: 10.1101/2021.11.22.469640.
2
Consolidated standards of reporting trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs) published in medical journals.试验报告的统一标准(CONSORT)以及医学期刊上发表的随机对照试验(RCT)的报告完整性。
Cochrane Database Syst Rev. 2012 Nov 14;11(11):MR000030. doi: 10.1002/14651858.MR000030.pub2.
3
Shared decision-making for people with asthma.哮喘患者的共同决策
Cochrane Database Syst Rev. 2017 Oct 3;10(10):CD012330. doi: 10.1002/14651858.CD012330.pub2.
4
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
9
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
10
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.

本文引用的文献

1
Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies.组学研究中伴随元数据的共享与格式化方面的感知和技术障碍。
Cell Genom. 2025 May 14;5(5):100845. doi: 10.1016/j.xgen.2025.100845. Epub 2025 Apr 10.
2
Ensuring Adherence to Standards in Experiment-Related Metadata Entered Via Spreadsheets.确保通过电子表格输入的实验相关元数据符合标准。
Sci Data. 2025 Feb 14;12(1):265. doi: 10.1038/s41597-025-04589-6.
3
Metadata integrity in bioinformatics: Bridging the gap between data and knowledge.生物信息学中的元数据完整性:弥合数据与知识之间的差距。
Comput Struct Biotechnol J. 2023 Oct 5;21:4895-4913. doi: 10.1016/j.csbj.2023.10.006. eCollection 2023.
4
Aligning Standards Communities for Omics Biodiversity Data: Sustainable Darwin Core-MIxS Interoperability.协调组学生物多样性数据的标准社区:可持续的达尔文核心-微生物组标准互操作性
Biodivers Data J. 2023 Oct 3;11:e112420. doi: 10.3897/BDJ.11.e112420. eCollection 2023.
5
Modeling community standards for metadata as templates makes data FAIR.将元数据的社区标准建模为模板可使数据变得 FAIR。
Sci Data. 2022 Nov 12;9(1):696. doi: 10.1038/s41597-022-01815-3.
6
Machine actionable metadata models.机器可操作的元数据模型。
Sci Data. 2022 Sep 30;9(1):592. doi: 10.1038/s41597-022-01707-6.
7
Without appropriate metadata, data-sharing mandates are pointless.没有适当的元数据,数据共享指令就毫无意义。
Nature. 2022 Sep;609(7926):222. doi: 10.1038/d41586-022-02820-7.
8
The GA4GH Phenopacket schema defines a computable representation of clinical data.全球基因组与健康联盟(GA4GH)表型数据包模式定义了临床数据的可计算表示形式。
Nat Biotechnol. 2022 Jun;40(6):817-820. doi: 10.1038/s41587-022-01357-4.
9
Data Sharing and Reuse: A Method by the AIRR Community.数据共享和再利用:AIRR 社区的方法。
Methods Mol Biol. 2022;2453:447-476. doi: 10.1007/978-1-0716-2115-8_23.
10
Ethical Views on Sharing Digital Data for Public Health Surveillance: Analysis of Survey Data Among Patients.关于为公共卫生监测共享数字数据的伦理观点:患者调查数据分析
Front Big Data. 2022 Apr 25;5:871236. doi: 10.3389/fdata.2022.871236. eCollection 2022.