患者识别与肿瘤识别管理：癌症多中心临床数据仓库中的质量计划

Patient Identification and Tumor Identification Management: Quality Program in a Cancer Multicentric Clinical Data Warehouse.

作者信息

Pallier Karine, Prot Olivier, Naldi Simone, Silva Francisco, Denis Thierry, Giry Olivier, Leobon Sophie, Deluche Elise, Tubiana-Mathieu Nicole

机构信息

Centre de Coordination en Cancérologie de la Haute-Vienne - 3C87, CHU de Limoges, Limoges, France.

Univ. Limoges, CNRS, XLIM, UMR 7252, Limoges, France.

出版信息

Cancer Inform. 2023 May 19;22:11769351231172609. doi: 10.1177/11769351231172609. eCollection 2023.

DOI:10.1177/11769351231172609

PMID:37223319

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10201142/

Abstract

BACKGROUND

The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.

PURPOSE

To develop algorithms matching heterogeneous data to "real" patients and "real" tumors with respect to patient identification (PI) and tumor identification (TI).

METHODS

A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.

RESULTS

Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).

CONCLUSION

The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.

摘要

背景

实体瘤区域基础数据库（RBST）是一个临床数据仓库，集中了法国两个省5家医疗机构中与癌症患者护理相关的信息。

目的

开发算法，在患者识别（PI）和肿瘤识别（TI）方面，将异构数据与“真实”患者和“真实”肿瘤进行匹配。

方法

使用用Java Neo4j编写的图形数据库，根据约20000名患者的数据构建RBST。使用莱文斯坦距离的PI算法基于识别患者的监管标准。TI算法基于6个特征构建：肿瘤位置和侧别、诊断日期、组织学、原发和转移状态。鉴于所收集数据的异构性质和语义，需要创建存储库（器官、同义词和组织学存储库）。TI算法使用骰子系数来匹配肿瘤。

结果

如果名字、姓氏、性别和出生日期完全一致，则患者匹配成功。这些参数的权重分别为28%、28%、21%和23%（年份占18%，月份占2.5%，日期占2.5%）。该算法的灵敏度为99.69%（95%置信区间[CI][98.89%，99,96%]），特异性为100%（95%CI[99.72%，100%]）。TI算法使用存储库，诊断日期和相关器官的权重分别为37.5%和37.5%，侧别为16%，组织学为5%，转移状态为4%。该算法的灵敏度为71%（95%CI[62.68%，78.25%]），特异性为100%（95%CI[94.31%，100%]）。