van Mulligen Erik M, Parry Rowan, van der Lei Johan, Kors Jan A
Dept of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands.
J Biomed Semantics. 2025 Aug 21;16(1):15. doi: 10.1186/s13326-025-00337-2.
The eTRANSAFE project developed tools that support translational research. One of the challenges in this project was to combine preclinical and clinical data, which are coded with different terminologies and granularities, and are expressed as single pre-coordinated, clinical concepts and as combinations of preclinical concepts from different terminologies. This study develops and evaluates the Rosetta Stone approach, which maps combinations of preclinical concepts to clinical, pre-coordinated concepts, allowing for different levels of exactness of mappings.
Concepts from preclinical and clinical terminologies used in eTRANSAFE have been mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). SNOMED CT acts as an intermediary terminology that provides the semantics to bridge between pre-coordinated clinical concepts and combinations of preclinical concepts with different levels of granularity. The mappings from clinical terminologies to SNOMED CT were taken from existing resources, while mappings from the preclinical terminologies to SNOMED CT were manually created. A coordination template defines the relation types that can be explored for a mapping and assigns a penalty score that reflects the inexactness of the mapping. A subset of 60 pre-coordinated concepts was mapped both with the Rosetta Stone semantic approach and with a lexical term matching approach. Both results were manually evaluated.
A total of 34,308 concepts from preclinical terminologies (Histopathology terminology, Standard for Exchange of Nonclinical Data (SEND) code lists, Mouse Adult Gross Anatomy Ontology) and a clinical terminology (MedDRA) were mapped to SNOMED CT as the intermediary bridging terminology. A terminology service has been developed that returns dynamically the exact and inexact mappings between preclinical and clinical concepts. On the evaluation set, the precision of the mappings from the terminology service was high (95%), much higher than for lexical term matching (22%).
The Rosetta Stone approach uses a semantically rich intermediate terminology to map between pre-coordinated clinical concepts and a combination of preclinical concepts with different levels of exactness. The possibility to generate not only exact but also inexact mappings allows to relate larger amounts of preclinical and clinical data, which can be helpful in translational use cases.
eTRANSAFE项目开发了支持转化研究的工具。该项目的挑战之一是整合临床前数据和临床数据,这些数据采用不同的术语和粒度进行编码,并表示为单一的预协调临床概念以及来自不同术语的临床前概念组合。本研究开发并评估了罗塞塔石碑方法,该方法将临床前概念组合映射到临床预协调概念,允许不同程度的映射精确性。
eTRANSAFE中使用的临床前和临床术语概念已映射到医学临床术语系统命名法(SNOMED CT)。SNOMED CT作为一个中间术语,为预协调临床概念和不同粒度级别的临床前概念组合之间提供语义桥梁。从临床术语到SNOMED CT的映射取自现有资源,而从临床前术语到SNOMED CT的映射是手动创建的。一个协调模板定义了可用于映射探索的关系类型,并分配一个反映映射不精确性的惩罚分数。60个预协调概念的子集同时使用罗塞塔石碑语义方法和词汇术语匹配方法进行映射。两种结果均进行人工评估。
总共34308个来自临床前术语(组织病理学术语、非临床数据交换标准(SEND)代码列表、小鼠成体大体解剖本体)和一个临床术语(MedDRA)的概念被映射到SNOMED CT作为中间桥梁术语。已开发出一种术语服务,可动态返回临床前和临床概念之间的精确和不精确映射。在评估集上,术语服务的映射精度很高(95%),远高于词汇术语匹配(22%)。
罗塞塔石碑方法使用语义丰富的中间术语在预协调临床概念和不同精确程度的临床前概念组合之间进行映射。不仅能够生成精确映射,还能生成不精确映射,这使得能够关联大量临床前和临床数据,这在转化用例中可能会有所帮助。