• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分析 HIV 临床试验的数据字典格式。

Analysis of data dictionary formats of HIV clinical trials.

机构信息

Lister Hill National Center for Biomedical Communication, National Library of Medicine, NIH, Bethesda, MD, United States of America.

出版信息

PLoS One. 2020 Oct 5;15(10):e0240047. doi: 10.1371/journal.pone.0240047. eCollection 2020.

DOI:10.1371/journal.pone.0240047
PMID:33017454
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7535029/
Abstract

BACKGROUND

Efforts to define research Common Data Elements try to harmonize data collection across clinical studies.

OBJECTIVE

Our goal was to analyze the quality and usability of data dictionaries of HIV studies.

METHODS

For the clinical domain of HIV, we searched data sharing platforms and acquired a set of 18 HIV related studies from which we analyzed 26 328 data elements. We identified existing standards for creating a data dictionary and reviewed their use. To facilitate aggregation across studies, we defined three types of data dictionary (data element, forms, and permissible values) and created a simple information model for each type.

RESULTS

An average study had 427 data elements (ranging from 46 elements to 9 945 elements). In terms of data type, 48.6% of data elements were string, 47.8% were numeric, 3.0% were date and 0.6% were date-time. No study in our sample explicitly declared a data element as a categorical variable and rather considered them either strings or numeric. Only for 61% of studies were we able to obtain permissible values. The majority of studies used CSV files to share a data dictionary while 22% of the studies used a non-computable, PDF format. All studies grouped their data elements. The average number of groups or forms per study was 24 (ranging between 2 and 124 groups/forms). An accurate and well formatted data dictionary facilitates error-free secondary analysis and can help with data de-identification.

CONCLUSION

We saw features of data dictionaries that made them difficult to use and understand. This included multiple data dictionary files or non-machine-readable documents, data elements included in data but not in the dictionary or missing data types or descriptions. Building on experience with aggregating data elements across a large set of studies, we created a set of recommendations (called CONSIDER statement) that can guide optimal data sharing of future studies.

摘要

背景

努力定义研究通用数据元素旨在协调临床研究中的数据收集。

目的

我们的目标是分析 HIV 研究的数据字典的质量和可用性。

方法

针对 HIV 的临床领域,我们在数据共享平台上进行了搜索,并从其中获取了一组 18 项与 HIV 相关的研究,我们对其中的 26328 个数据元素进行了分析。我们确定了创建数据字典的现有标准,并对其使用情况进行了审查。为了便于跨研究进行聚合,我们定义了三种类型的数据字典(数据元素、表单和允许的值),并为每种类型创建了一个简单的信息模型。

结果

一项平均研究有 427 个数据元素(范围从 46 个元素到 9945 个元素)。从数据类型来看,48.6%的数据元素为字符串,47.8%为数值,3.0%为日期,0.6%为日期时间。我们的样本中没有研究明确将数据元素声明为分类变量,而是将其视为字符串或数值。只有 61%的研究能够获得允许的值。大多数研究使用 CSV 文件来共享数据字典,而 22%的研究使用不可计算的 PDF 格式。所有研究都对其数据元素进行了分组。平均每个研究的分组或表单数量为 24(范围在 2 到 124 个分组/表单之间)。准确且格式良好的数据字典可促进无错误的二次分析,并有助于数据去标识。

结论

我们发现数据字典的一些特性使其难以使用和理解。这包括多个数据字典文件或不可机器读取的文档、包含在数据中但不在字典中的数据元素、缺失的数据类型或描述。基于在一组大型研究中聚合数据元素的经验,我们创建了一组建议(称为 CONSIDER 声明),可指导未来研究的最佳数据共享。

相似文献

1
Analysis of data dictionary formats of HIV clinical trials.分析 HIV 临床试验的数据字典格式。
PLoS One. 2020 Oct 5;15(10):e0240047. doi: 10.1371/journal.pone.0240047. eCollection 2020.
2
Evaluation of Research Accessibility and Data Elements of HIV Registries.艾滋病病毒登记处的研究可及性和数据元素评估
Curr HIV Res. 2019;17(4):258-265. doi: 10.2174/1570162X17666190924195439.
3
Standardized Representation of Clinical Study Data Dictionaries with CIMI Archetypes.使用CIMI原型对临床研究数据字典进行标准化表示。
AMIA Annu Symp Proc. 2017 Feb 10;2016:1119-1128. eCollection 2016.
4
CONSIDER Statement: Consolidated Recommendations for Sharing Individual Participant Data from Human Clinical Studies.CONSIDER 声明:从人体临床研究中共享个体参与者数据的综合建议。
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:438-444. eCollection 2021.
5
A fast and efficient python library for interfacing with the Biological Magnetic Resonance Data Bank.一个用于与生物磁共振数据库接口的快速高效的Python库。
BMC Bioinformatics. 2017 Mar 17;18(1):175. doi: 10.1186/s12859-017-1580-5.
6
Core Standards of the EUBIROD Project. Defining a European Diabetes Data Dictionary for Clinical Audit and Healthcare Delivery.EUBIROD项目核心标准。定义用于临床审计和医疗服务提供的欧洲糖尿病数据字典。
Methods Inf Med. 2016;55(2):166-76. doi: 10.3414/ME15-01-0016. Epub 2015 Dec 15.
7
The semantics of Chemical Markup Language (CML): dictionaries and conventions.化学标记语言 (CML) 的语义:字典和约定。
J Cheminform. 2011 Oct 14;3:43. doi: 10.1186/1758-2946-3-43.
8
The AusTraits plant dictionary.澳式植物词典。
Sci Data. 2024 May 25;11(1):537. doi: 10.1038/s41597-024-03368-z.
9
The Semantic Data Dictionary - An Approach for Describing and Annotating Data.语义数据字典——一种描述和注释数据的方法。
Data Intell. 2020 Fall;2(4):443-486. doi: 10.1162/dint_a_00058. Epub 2020 Oct 22.
10
dbGaPCheckup: pre-submission checks of dbGaP-formatted subject phenotype files.dbGaPCheckup:dbGaP 格式主题表型文件的提交前检查。
BMC Bioinformatics. 2023 Mar 3;24(1):77. doi: 10.1186/s12859-023-05200-8.

引用本文的文献

1
Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review.健康数据治理中可发现性、可访问性、互操作性和可重用性数据原则的举措、概念和实施实践:范围综述。
J Med Internet Res. 2023 Aug 28;25:e45013. doi: 10.2196/45013.
2
Learning important common data elements from shared study data: The All of Us program analysis.从共享研究数据中学习重要的通用数据元素:All of Us 计划分析。
PLoS One. 2023 Jul 7;18(7):e0283601. doi: 10.1371/journal.pone.0283601. eCollection 2023.
3
Harmonization and standardization of data for a pan-European cohort on SARS- CoV-2 pandemic.针对SARS-CoV-2大流行的泛欧洲队列的数据协调与标准化。
NPJ Digit Med. 2022 Jun 14;5(1):75. doi: 10.1038/s41746-022-00620-x.
4
CONSIDER Statement: Consolidated Recommendations for Sharing Individual Participant Data from Human Clinical Studies.CONSIDER 声明:从人体临床研究中共享个体参与者数据的综合建议。
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:438-444. eCollection 2021.
5
Discoverability of information on clinical trial data-sharing platforms.临床试验数据共享平台上信息的可发现性。
J Med Libr Assoc. 2021 Apr 1;109(2):240-247. doi: 10.5195/jmla.2021.992.

本文引用的文献

1
Problems in FAIRifying Medical Datasets.使医学数据集符合FAIR原则过程中存在的问题。
Stud Health Technol Inform. 2020 Jun 16;270:392-396. doi: 10.3233/SHTI200189.
2
Data sharing in the era of COVID-19.新冠疫情时代的数据共享。
Lancet Digit Health. 2020 May;2(5):e224. doi: 10.1016/S2589-7500(20)30082-0. Epub 2020 Apr 28.
3
Sharing of Individual Participant Data from Clinical Trials: General Comparison and HIV Use Case.临床试验中个体参与者数据的共享:总体比较与艾滋病病毒用例
AMIA Annu Symp Proc. 2020 Mar 4;2019:647-654. eCollection 2019.
4
One Step Away from Technology but One Step Towards Domain Experts-MDRBridge: A Template-Based ISO 11179-Compliant Metadata Processing Pipeline.离技术一步之遥,离领域专家一步之近——MDRBridge:基于模板的符合ISO 11179的元数据处理管道。
Methods Inf Med. 2019 Dec;58(S 02):e72-e79. doi: 10.1055/s-0039-3399579. Epub 2019 Dec 18.
5
Evaluation of Research Accessibility and Data Elements of HIV Registries.艾滋病病毒登记处的研究可及性和数据元素评估
Curr HIV Res. 2019;17(4):258-265. doi: 10.2174/1570162X17666190924195439.
6
Retraction Notice to: The Negative Association between Religiousness and Children's Altruism across the World.撤回声明:《世界各地宗教信仰与儿童利他主义之间的负相关关系》
Curr Biol. 2019 Aug 5;29(15):2595. doi: 10.1016/j.cub.2019.07.030.
7
The REDCap consortium: Building an international community of software platform partners.REDCap 联盟:构建软件平台合作伙伴的国际社区。
J Biomed Inform. 2019 Jul;95:103208. doi: 10.1016/j.jbi.2019.103208. Epub 2019 May 9.
8
A Concept for a Data Dictionary System Supporting for Clinical Research.一种支持临床研究的数据字典系统的概念。
Stud Health Technol Inform. 2019;258:158-162.
9
Analyzing Real-World Use of Research Common Data Elements.分析研究通用数据元素的实际应用情况。
AMIA Annu Symp Proc. 2018 Dec 5;2018:602-608. eCollection 2018.
10
Reflection paper on copyright, patient-reported outcome instruments and their translations.关于版权、患者报告结局量表及其翻译的反思性论文。
Health Qual Life Outcomes. 2018 Dec 5;16(1):224. doi: 10.1186/s12955-018-1050-4.