使用图形数据库来表示和分析癌症研究数据标准映射中的挑战。

Challenges in Using a Graph Database to Represent and Analyze Mappings of Cancer Study Data Standards.

作者信息

Renner Robinette, Jiang Guoqian

机构信息

University of San Francisco, San Francisco, CA.

Mayo Clinic, Rochester, MN, USA.

出版信息

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:517-526. eCollection 2020.

PMID:32477673

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7233100/

Abstract

While using data standards can facilitate research by making it easier to share data, manually mapping to data standards creates an obstacle to their adoption. Semi-automated mapping strategies can reduce the manual mapping burden. Machine learning approaches, such as artificial neural networks, can predict mappings between clinical data standards but are limited by the need for training data. We developed a graph database that incorporates the Biomedical Research Integrated Domain Group (BRIDG) model, Common Data Elements (CDEs) from the National Cancer Institute's (NCI) cancer Data Standards Registry and Repository, and the NCI Thesaurus. We then used a shortest path algorithm to predict mappings from CDEs to classes in the BRIDG model. The resulting graph database provides a robust semantic framework for analysis and quality assurance testing. Using the graph database to predict CDE to BRIDG class mappings was limited by the subjective nature of mapping and data quality issues.

摘要

虽然使用数据标准可以通过使数据共享更容易来促进研究，但手动映射到数据标准会阻碍其采用。半自动映射策略可以减轻手动映射负担。机器学习方法，如人工神经网络，可以预测临床数据标准之间的映射，但受到训练数据需求的限制。我们开发了一个图形数据库，它整合了生物医学研究综合领域组（BRIDG）模型、来自美国国立癌症研究所（NCI）癌症数据标准注册库和知识库的通用数据元素（CDE）以及NCI叙词表。然后，我们使用最短路径算法来预测从CDE到BRIDG模型中的类的映射。生成的图形数据库为分析和质量保证测试提供了一个强大的语义框架。使用图形数据库预测CDE到BRIDG类的映射受到映射的主观性和数据质量问题的限制。

相似文献

Challenges in Using a Graph Database to Represent and Analyze Mappings of Cancer Study Data Standards.使用图形数据库来表示和分析癌症研究数据标准映射中的挑战。

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:517-526. eCollection 2020.

Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner.使用人工神经网络以半自动方式将癌症通用数据元素映射到生物医学研究集成领域组模型。

BMC Med Inform Decis Mak. 2019 Dec 23;19(Suppl 7):276. doi: 10.1186/s12911-019-0979-5.

Representation of Time-Relevant Common Data Elements in the Cancer Data Standards Repository: Statistical Evaluation of an Ontological Approach.癌症数据标准库中与时间相关的通用数据元素表示：本体方法的统计评估

JMIR Med Inform. 2018 Feb 22;6(1):e7. doi: 10.2196/medinform.8175.

Quality evaluation of value sets from cancer study common data elements using the UMLS semantic groups.使用 UMLS 语义组对癌症研究通用数据元素中的值集进行质量评估。

J Am Med Inform Assoc. 2012 Jun;19(e1):e129-36. doi: 10.1136/amiajnl-2011-000739. Epub 2012 Apr 17.

BRIDG: a domain information model for translational and clinical protocol-driven research.BRIDG：用于转化医学和临床方案驱动研究的领域信息模型

J Am Med Inform Assoc. 2017 Sep 1;24(5):882-890. doi: 10.1093/jamia/ocx004.

The BRIDG project: a technical report.BRIDG项目：一份技术报告。

J Am Med Inform Assoc. 2008 Mar-Apr;15(2):130-7. doi: 10.1197/jamia.M2556. Epub 2007 Dec 20.

Sharing behavioral data through a grid infrastructure using data standards.通过使用数据标准的网格基础设施共享行为数据。

J Am Med Inform Assoc. 2014 Jul-Aug;21(4):642-9. doi: 10.1136/amiajnl-2013-001763. Epub 2013 Sep 27.

Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.构建基于语义网的元数据存储库以促进癌症基因组研究中的详细临床建模。

J Biomed Semantics. 2017 Jun 5;8(1):19. doi: 10.1186/s13326-017-0130-4.

Quality Assurance of Cancer Study Common Data Elements Using A Post-Coordination Approach.使用后协调方法对癌症研究通用数据元素进行质量保证。

AMIA Annu Symp Proc. 2015 Nov 5;2015:659-68. eCollection 2015.

Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards.使用语义网技术生成特定领域模板以支持临床研究元数据标准。

J Biomed Semantics. 2016 Mar 3;7:10. doi: 10.1186/s13326-016-0053-5. eCollection 2016.

本文引用的文献

BMC Med Inform Decis Mak. 2019 Dec 23;19(Suppl 7):276. doi: 10.1186/s12911-019-0979-5.

Rethinking Data Sharing at the Dawn of a Health Data Economy: A Viewpoint.健康数据经济初兴之际对数据共享的再思考：一种观点

J Med Internet Res. 2018 Nov 22;20(11):e11519. doi: 10.2196/11519.

Using Graph Tools on Metadata Repositories.在元数据存储库上使用图形工具。

Stud Health Technol Inform. 2018;253:55-59.

Owlready: Ontology-oriented programming in Python with automatic classification and high level constructs for biomedical ontologies.Owlready：用于生物医学本体的面向本体的Python编程，具备自动分类和高级构造。

Artif Intell Med. 2017 Jul;80:11-28. doi: 10.1016/j.artmed.2017.07.002. Epub 2017 Aug 14.

BRIDG: a domain information model for translational and clinical protocol-driven research.BRIDG：用于转化医学和临床方案驱动研究的领域信息模型

J Am Med Inform Assoc. 2017 Sep 1;24(5):882-890. doi: 10.1093/jamia/ocx004.

An alternative database approach for management of SNOMED CT and improved patient data queries.一种用于管理医学系统命名法临床术语（SNOMED CT）及改进患者数据查询的替代数据库方法。

J Biomed Inform. 2015 Oct;57:350-7. doi: 10.1016/j.jbi.2015.08.016. Epub 2015 Aug 21.

Biomedical Data Sharing and Reuse: Attitudes and Practices of Clinical and Scientific Research Staff.生物医学数据共享与再利用：临床与科研人员的态度及实践

PLoS One. 2015 Jun 24;10(6):e0129506. doi: 10.1371/journal.pone.0129506. eCollection 2015.

Semantically linking in silico cancer models.在计算机癌症模型中的语义链接。

Cancer Inform. 2014 Dec 8;13(Suppl 1):133-43. doi: 10.4137/CIN.S13895. eCollection 2014.

Fostering responsible data sharing through standards.通过标准促进负责任的数据共享。

N Engl J Med. 2014 Jun 5;370(23):2163-5. doi: 10.1056/NEJMp1401444.

Use artificial neural network to align biological ontologies.使用人工神经网络来对齐生物本体。

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S16. doi: 10.1186/1471-2164-9-S2-S16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验