• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关联实体属性对 (LEAP):一种用于数据池化的协调框架。

Linked Entity Attribute Pair (LEAP): A Harmonization Framework for Data Pooling.

机构信息

Memorial Sloan Kettering Cancer Center, New York, NY.

Center for Translational Data Science, University of Chicago, Chicago, IL.

出版信息

JCO Clin Cancer Inform. 2020 Aug;4:691-699. doi: 10.1200/CCI.20.00037.

DOI:10.1200/CCI.20.00037
PMID:32755461
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7469618/
Abstract

PURPOSE

As data-sharing projects become increasingly frequent, so does the need to map data elements between multiple classification systems. A generic, robust, shareable architecture will result in increased efficiency and transparency of the mapping process, while upholding the integrity of the data.

MATERIALS AND METHODS

The American Association for Cancer Research's Genomics Evidence Neoplasia Information Exchange (GENIE) collects clinical and genomic data for precision cancer medicine. As part of its commitment to open science, GENIE has partnered with the National Cancer Institute's Genomic Data Commons (GDC) as a secondary repository. After initial efforts to submit data from GENIE to GDC failed, we realized the need for a solution to allow for the iterative mapping of data elements between dynamic classification systems. We developed the Linked Entity Attribute Pair (LEAP) database framework to store and manage the term mappings used to submit data from GENIE to GDC.

RESULTS

After creating and populating the LEAP framework, we identified 195 mappings from GENIE to GDC requiring remediation and observed a 28% reduction in effort to resolve these issues, as well as a reduction in inadvertent errors. These results led to a decrease in the time to map between OncoTree, the cancer type ontology used by GENIE, and International Classification of Disease for Oncology, 3rd Edition, used by GDC, from several months to less than 1 week.

CONCLUSION

The LEAP framework provides a streamlined mapping process among various classification systems and allows for reusability so that efforts to create or adjust mappings are straightforward. The ability of the framework to track changes over time streamlines the process to map data elements across various dynamic classification systems.

摘要

目的

随着数据共享项目的日益频繁,需要在多个分类系统之间映射数据元素。通用、强大、可共享的架构将提高映射过程的效率和透明度,同时保持数据的完整性。

材料和方法

美国癌症研究协会的基因组学证据肿瘤信息交换(GENIE)为精准癌症医学收集临床和基因组数据。作为其开放科学承诺的一部分,GENIE 与美国国立癌症研究所的基因组数据共享中心(GDC)合作,作为二级存储库。在最初努力将 GENIE 数据提交到 GDC 失败后,我们意识到需要一个解决方案,以允许在动态分类系统之间迭代映射数据元素。我们开发了链接实体属性对(LEAP)数据库框架来存储和管理用于将 GENIE 数据提交到 GDC 的术语映射。

结果

在创建和填充 LEAP 框架后,我们确定了 195 个从 GENIE 到 GDC 的映射需要修复,并观察到解决这些问题的工作量减少了 28%,并且无意中的错误也减少了。这些结果导致在 GENIE 使用的癌症类型本体 OncoTree 和 GDC 使用的国际肿瘤学疾病分类第 3 版之间进行映射的时间从几个月减少到不到 1 周。

结论

LEAP 框架提供了各种分类系统之间的简化映射流程,并允许重用,因此创建或调整映射的工作非常简单。该框架能够跟踪随时间的变化,简化了在各种动态分类系统之间映射数据元素的过程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ab/7469618/2dbef7ab4111/CCI.20.00037f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ab/7469618/4383bd65d470/CCI.20.00037f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ab/7469618/a9a684a1e234/CCI.20.00037f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ab/7469618/2dbef7ab4111/CCI.20.00037f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ab/7469618/4383bd65d470/CCI.20.00037f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ab/7469618/a9a684a1e234/CCI.20.00037f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ab/7469618/2dbef7ab4111/CCI.20.00037f3.jpg

相似文献

1
Linked Entity Attribute Pair (LEAP): A Harmonization Framework for Data Pooling.关联实体属性对 (LEAP):一种用于数据池化的协调框架。
JCO Clin Cancer Inform. 2020 Aug;4:691-699. doi: 10.1200/CCI.20.00037.
2
OncoTree: A Cancer Classification System for Precision Oncology.OncoTree:精准肿瘤学的癌症分类系统。
JCO Clin Cancer Inform. 2021 Feb;5:221-230. doi: 10.1200/CCI.20.00108.
3
Project GENIE Goes Public.GENIE 项目公开。
Cancer Discov. 2017 Feb;7(2):118. doi: 10.1158/2159-8290.CD-NB2017-002. Epub 2017 Jan 11.
4
AACR Project GENIE: Powering Precision Medicine through an International Consortium.美国癌症研究协会(AACR)项目GENIE:通过国际联盟推动精准医学发展。
Cancer Discov. 2017 Aug;7(8):818-831. doi: 10.1158/2159-8290.CD-17-0151. Epub 2017 Jun 1.
5
A data processing pipeline for the AACR project GENIE biopharma collaborative data with the {genieBPC} R package.GENIE 生物制药合作数据的 AACR 项目的数据处理管道,使用 {genieBPC} R 包。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac796.
6
American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange: From Inception to First Data Release and Beyond-Lessons Learned and Member Institutions' Perspectives.美国癌症研究协会基因组证据肿瘤信息交换项目:从启动到首次数据发布及未来——经验教训与成员机构观点
JCO Clin Cancer Inform. 2018 Dec;2:1-14. doi: 10.1200/CCI.17.00083.
7
AACR Project GENIE: 100,000 Cases and Beyond.AACR Project GENIE:10 万例及以上。
Cancer Discov. 2022 Sep 2;12(9):2044-2057. doi: 10.1158/2159-8290.CD-21-1547.
8
Precision Oncology Core Data Model to Support Clinical Genomics Decision Making.精准肿瘤学核心数据模型,以支持临床基因组学决策制定。
JCO Clin Cancer Inform. 2023 Apr;7:e2200108. doi: 10.1200/CCI.22.00108.
9
Uniform genomic data analysis in the NCI Genomic Data Commons.在 NCI 基因组数据共享中心进行统一的基因组数据分析。
Nat Commun. 2021 Feb 22;12(1):1226. doi: 10.1038/s41467-021-21254-9.
10
AACR releases large cancer genomic data set from project GENIE.美国癌症研究协会发布了来自GENIE项目的大型癌症基因组数据集。
Cancer. 2017 May 15;123(10):1685. doi: 10.1002/cncr.30755.

引用本文的文献

1
OncoTree: A Cancer Classification System for Precision Oncology.OncoTree:精准肿瘤学的癌症分类系统。
JCO Clin Cancer Inform. 2021 Feb;5:221-230. doi: 10.1200/CCI.20.00108.

本文引用的文献

1
Call for Data Standardization: Lessons Learned and Recommendations in an Imaging Study.呼吁数据标准化:一项影像学研究中的经验教训与建议
JCO Clin Cancer Inform. 2019 Nov;3:1-11. doi: 10.1200/CCI.19.00056.
2
The NCI Genomic Data Commons as an engine for precision medicine.美国国立癌症研究所基因组数据共享库作为精准医学的引擎。
Blood. 2017 Jul 27;130(4):453-459. doi: 10.1182/blood-2017-03-735654. Epub 2017 Jun 9.
3
AACR Project GENIE: Powering Precision Medicine through an International Consortium.美国癌症研究协会(AACR)项目GENIE:通过国际联盟推动精准医学发展。
Cancer Discov. 2017 Aug;7(8):818-831. doi: 10.1158/2159-8290.CD-17-0151. Epub 2017 Jun 1.
4
Toward Rigorous Data Harmonization in Cancer Epidemiology Research: One Approach.迈向癌症流行病学研究中的严格数据协调:一种方法。
Am J Epidemiol. 2015 Dec 15;182(12):1033-8. doi: 10.1093/aje/kwv133. Epub 2015 Nov 20.
5
Developing predictive molecular maps of human disease through community-based modeling.通过基于社区的建模来开发人类疾病的预测分子图谱。
Nat Genet. 2012 Jan 27;44(2):127-30. doi: 10.1038/ng.1089.
6
Guidelines for the effective use of entity-attribute-value modeling for biomedical databases.生物医学数据库中实体-属性-值建模的有效使用指南。
Int J Med Inform. 2007 Nov-Dec;76(11-12):769-79. doi: 10.1016/j.ijmedinf.2006.09.023. Epub 2006 Nov 13.
7
Dynamic tables: an architecture for managing evolving, heterogeneous biomedical data in relational database management systems.动态表:一种用于在关系数据库管理系统中管理不断演变的异构生物医学数据的架构。
J Am Med Inform Assoc. 2007 Jan-Feb;14(1):86-93. doi: 10.1197/jamia.M2189. Epub 2006 Oct 26.