Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany.
Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany.
J Alzheimers Dis. 2024;99(4):1409-1423. doi: 10.3233/JAD-240116.
Despite numerous past endeavors for the semantic harmonization of Alzheimer's disease (AD) cohort studies, an automatic tool has yet to be developed.
As cohort studies form the basis of data-driven analysis, harmonizing them is crucial for cross-cohort analysis. We aimed to accelerate this task by constructing an automatic harmonization tool.
We created a common data model (CDM) through cross-mapping data from 20 cohorts, three CDMs, and ontology terms, which was then used to fine-tune a BioBERT model. Finally, we evaluated the model using three previously unseen cohorts and compared its performance to a string-matching baseline model.
Here, we present our AD-Mapper interface for automatic harmonization of AD cohort studies, which outperformed a string-matching baseline on previously unseen cohort studies. We showcase our CDM comprising 1218 unique variables.
AD-Mapper leverages semantic similarities in naming conventions across cohorts to improve mapping performance.
尽管过去有许多努力致力于阿尔茨海默病(AD)队列研究的语义协调,但尚未开发出自动工具。
由于队列研究构成了数据驱动分析的基础,因此协调它们对于跨队列分析至关重要。我们旨在通过构建自动协调工具来加速这项任务。
我们通过从 20 个队列、3 个 CDM 和本体论术语中交叉映射数据创建了一个通用数据模型(CDM),然后使用该模型对 BioBERT 模型进行微调。最后,我们使用三个以前未见过的队列来评估模型,并将其性能与字符串匹配基线模型进行比较。
在此,我们展示了用于自动协调 AD 队列研究的 AD-Mapper 界面,该界面在以前未见过的队列研究中优于字符串匹配基线模型。我们展示了包含 1218 个独特变量的 CDM。
AD-Mapper 利用了队列命名约定中的语义相似性来提高映射性能。