Suppr超能文献

阿尔茨海默病相关数据元素与 NIH 通用数据元素的映射。

Mapping of Alzheimer's disease related data elements and the NIH Common Data Elements.

机构信息

McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.

Department of Neurology, McGovern School of Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

BMC Med Inform Decis Mak. 2024 Apr 19;24(Suppl 3):103. doi: 10.1186/s12911-024-02500-8.

Abstract

BACKGROUND

Alzheimer's Disease (AD) is a devastating disease that destroys memory and other cognitive functions. There has been an increasing research effort to prevent and treat AD. In the US, two major data sharing resources for AD research are the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI); Additionally, the National Institutes of Health (NIH) Common Data Elements (CDE) Repository has been developed to facilitate data sharing and improve the interoperability among data sets in various disease research areas.

METHOD

To better understand how AD-related data elements in these resources are interoperable with each other, we leverage different representation models to map data elements from different resources: NACC to ADNI, NACC to NIH CDE, and ADNI to NIH CDE. We explore bag-of-words based and word embeddings based models (Word2Vec and BioWordVec) to perform the data element mappings in these resources.

RESULTS

The data dictionaries downloaded on November 23, 2021 contain 1,195 data elements in NACC, 13,918 in ADNI, and 27,213 in NIH CDE Repository. Data element preprocessing reduced the numbers of NACC and ADNI data elements for mapping to 1,099 and 7,584 respectively. Manual evaluation of the mapping results showed that the bag-of-words based approach achieved the best precision, while the BioWordVec based approach attained the best recall. In total, the three approaches mapped 175 out of 1,099 (15.92%) NACC data elements to ADNI; 107 out of 1,099 (9.74%) NACC data elements to NIH CDE; and 171 out of 7,584 (2.25%) ADNI data elements to NIH CDE.

CONCLUSIONS

The bag-of-words based and word embeddings based approaches showed promise in mapping AD-related data elements between different resources. Although the mapping approaches need further improvement, our result indicates that there is a critical need to standardize CDEs across these valuable AD research resources in order to maximize the discoveries regarding AD pathophysiology, diagnosis, and treatment that can be gleaned from them.

摘要

背景

阿尔茨海默病(AD)是一种破坏性疾病,它会破坏记忆和其他认知功能。人们一直在努力预防和治疗 AD。在美国,有两个主要的 AD 研究数据共享资源,即国家阿尔茨海默病协调中心(NACC)和阿尔茨海默病神经影像学倡议(ADNI);此外,美国国立卫生研究院(NIH)通用数据元素(CDE)存储库已经开发出来,以促进数据共享,并提高各个疾病研究领域数据集之间的互操作性。

方法

为了更好地了解这些资源中的 AD 相关数据元素如何相互协作,我们利用不同的表示模型来映射来自不同资源的数据元素:NACC 到 ADNI,NACC 到 NIH CDE,以及 ADNI 到 NIH CDE。我们探索了基于词汇袋和基于词嵌入的模型(Word2Vec 和 BioWordVec)来执行这些资源中的数据元素映射。

结果

2021 年 11 月 23 日下载的数据字典包含 1195 个 NACC 数据元素、13918 个 ADNI 数据元素和 27213 个 NIH CDE 存储库数据元素。数据元素预处理将映射的 NACC 和 ADNI 数据元素数量分别减少到 1099 和 7584。对映射结果的手动评估表明,基于词汇袋的方法达到了最佳的精度,而基于词嵌入的方法则达到了最佳的召回率。总的来说,这三种方法共将 1099 个 NACC 数据元素中的 175 个(15.92%)映射到 ADNI;将 1099 个 NACC 数据元素中的 107 个(9.74%)映射到 NIH CDE;以及将 7584 个 ADNI 数据元素中的 171 个(2.25%)映射到 NIH CDE。

结论

基于词汇袋和词嵌入的方法在不同资源之间映射 AD 相关数据元素方面显示出了前景。尽管映射方法需要进一步改进,但我们的结果表明,需要在这些有价值的 AD 研究资源之间标准化 CDE,以便最大限度地从这些资源中获得关于 AD 病理生理学、诊断和治疗的发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e37/11027215/13e491255203/12911_2024_2500_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验