CHU Lille, INCLUDE: Integration Center of the Lille University hospital for Data Exploration, F-59000, Lille, France.
Univ. Lille, CHU Lille, ULR 2694 - METRICS, Public health dept, F-59000, Lille, France.
Stud Health Technol Inform. 2021 Nov 18;287:94-98. doi: 10.3233/SHTI210823.
The use of international laboratory terminologies inside hospital information systems is required to conduct data reuse analyses through inter-hospital databases. While most terminology matching techniques performing semantic interoperability are language-based, another strategy is to use distribution matching that performs terms matching based on the statistical similarity. In this work, our objective is to design and assess a structured framework to perform distribution matching on concepts described by continuous variables. We propose a framework that combines distribution matching and machine learning techniques. Using a training sample consisting of correct and incorrect correspondences between different terminologies, a match probability score is built. For each term, best candidates are returned and sorted in decreasing order using the probability given by the model. Searching 101 terms from Lille University Hospital among the same list of concepts in MIMIC-III, the model returned the correct match in the top 5 candidates for 96 of them (95%). Using this open-source framework with a top-k suggestions system could make the expert validation of terminologies alignment easier.
在医院信息系统中使用国际实验室术语,以便通过医院间数据库进行数据重用分析。虽然大多数执行语义互操作性的术语匹配技术都是基于语言的,但另一种策略是使用基于统计相似性的分布匹配。在这项工作中,我们的目标是设计和评估一个结构化框架,以对连续变量描述的概念执行分布匹配。我们提出了一个结合分布匹配和机器学习技术的框架。使用由不同术语之间正确和错误对应关系组成的训练样本,构建匹配概率得分。对于每个术语,使用模型提供的概率,返回最佳候选并按降序排序。在 MIMIC-III 中的相同概念列表中搜索来自里尔大学医院的 101 个术语,该模型在 96 个术语(95%)的前 5 个候选中返回了正确的匹配。使用带有 top-k 建议系统的这个开源框架可以使术语对齐的专家验证更加容易。