Renner Robinette, Jiang Guoqian
University of San Francisco, San Francisco, CA.
Mayo Clinic, Rochester, MN, USA.
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:517-526. eCollection 2020.
While using data standards can facilitate research by making it easier to share data, manually mapping to data standards creates an obstacle to their adoption. Semi-automated mapping strategies can reduce the manual mapping burden. Machine learning approaches, such as artificial neural networks, can predict mappings between clinical data standards but are limited by the need for training data. We developed a graph database that incorporates the Biomedical Research Integrated Domain Group (BRIDG) model, Common Data Elements (CDEs) from the National Cancer Institute's (NCI) cancer Data Standards Registry and Repository, and the NCI Thesaurus. We then used a shortest path algorithm to predict mappings from CDEs to classes in the BRIDG model. The resulting graph database provides a robust semantic framework for analysis and quality assurance testing. Using the graph database to predict CDE to BRIDG class mappings was limited by the subjective nature of mapping and data quality issues.
虽然使用数据标准可以通过使数据共享更容易来促进研究,但手动映射到数据标准会阻碍其采用。半自动映射策略可以减轻手动映射负担。机器学习方法,如人工神经网络,可以预测临床数据标准之间的映射,但受到训练数据需求的限制。我们开发了一个图形数据库,它整合了生物医学研究综合领域组(BRIDG)模型、来自美国国立癌症研究所(NCI)癌症数据标准注册库和知识库的通用数据元素(CDE)以及NCI叙词表。然后,我们使用最短路径算法来预测从CDE到BRIDG模型中的类的映射。生成的图形数据库为分析和质量保证测试提供了一个强大的语义框架。使用图形数据库预测CDE到BRIDG类的映射受到映射的主观性和数据质量问题的限制。