Jouhet Vianney, Mougin Fleur, Bréchat Bérénice, Thiessard Frantz
CHU de Bordeaux, Pole de sante publiqueService d'information medicale, unit IAM, F-33000Bordeaux, France.
Univ. Bordeaux, Inserm, UMR 1219, Bordeaux, F-33000, France.
J Biomed Semantics. 2017 Feb 7;8(1):6. doi: 10.1186/s13326-017-0114-4.
Identifying incident cancer cases within a population remains essential for scientific research in oncology. Data produced within electronic health records can be useful for this purpose. Due to the multiplicity of providers, heterogeneous terminologies such as ICD-10 and ICD-O-3 are used for oncology diagnosis recording purpose. To enable disease identification based on these diagnoses, there is a need for integrating disease classifications in oncology. Our aim was to build a model integrating concepts involved in two disease classifications, namely ICD-10 (diagnosis) and ICD-O-3 (topography and morphology), despite their structural heterogeneity. Based on the NCIt, a "derivative" model for linking diagnosis and topography-morphology combinations was defined and built. ICD-O-3 and ICD-10 codes were then used to instantiate classes of the "derivative" model. Links between terminologies obtained through the model were then compared to mappings provided by the Surveillance, Epidemiology, and End Results (SEER) program.
The model integrated 42% of neoplasm ICD-10 codes (excluding metastasis), 98% of ICD-O-3 morphology codes (excluding metastasis) and 68% of ICD-O-3 topography codes. For every codes instantiating at least a class in the "derivative" model, comparison with SEER mappings reveals that all mappings were actually available in the model as a link between the corresponding codes.
We have proposed a method to automatically build a model for integrating ICD-10 and ICD-O-3 based on the NCIt. The resulting "derivative" model is a machine understandable resource that enables an integrated view of these heterogeneous terminologies. The NCIt structure and the available relationships can help to bridge disease classifications taking into account their structural and granular heterogeneities. However, (i) inconsistencies exist within the NCIt leading to misclassifications in the "derivative" model, (ii) the "derivative" model only integrates a part of ICD-10 and ICD-O-3. The NCIt is not sufficient for integration purpose and further work based on other termino-ontological resources is needed in order to enrich the model and avoid identified inconsistencies.
在人群中识别新发癌症病例对于肿瘤学的科学研究仍然至关重要。电子健康记录中产生的数据可用于此目的。由于医疗服务提供者的多样性,诸如ICD - 10和ICD - O - 3等异构术语被用于肿瘤诊断记录。为了基于这些诊断进行疾病识别,需要整合肿瘤学中的疾病分类。我们的目标是构建一个整合两种疾病分类(即ICD - 10(诊断)和ICD - O - 3(部位和形态))中所涉及概念的模型,尽管它们结构异构。基于美国国立癌症研究所术语表(NCIt),定义并构建了一个用于链接诊断与部位 - 形态组合的“衍生”模型。然后使用ICD - O - 3和ICD - 10编码来实例化“衍生”模型的类。接着将通过该模型获得的术语之间的链接与监测、流行病学和最终结果(SEER)计划提供的映射进行比较。
该模型整合了42%的肿瘤ICD - 10编码(不包括转移)、98%的ICD - O - 3形态学编码(不包括转移)和68%的ICD - O - 3部位编码。对于在“衍生”模型中至少实例化一个类的每个编码,与SEER映射的比较表明,所有映射实际上都作为相应编码之间的链接存在于模型中。
我们提出了一种基于NCIt自动构建整合ICD - 10和ICD - O - 3的模型的方法。所得的“衍生”模型是一种机器可理解的资源,能够实现对这些异构术语的综合视图。NCIt结构和可用关系有助于跨越疾病分类,同时考虑到它们的结构和粒度异构性。然而,(i)NCIt内部存在不一致性,导致“衍生”模型中的错误分类,(ii)“衍生”模型仅整合了ICD - 10和ICD - O - 3的一部分。NCIt对于整合目的而言是不够的,需要基于其他术语本体资源进行进一步的工作,以丰富模型并避免已识别的不一致性。