Ruppert Laura P, He Jinghua, Martin Joel, Eckert George, Ouyang Fangqian, Church Abby, Dexter Paul, Hui Siu, Haggstrom David
J Registry Manag. 2016 Winter;43(4):174-8.
Large automated electronic health records (EHRs), if brought together in a federated data model, have the potential to serve as valuable population-based tools in studying the patterns and effectiveness of treatment. The Indiana Network for Patient Care (INPC) is a unique federated EHR data repository that contains data collected from a large population across various health care settings throughout the state of Indiana. The INPC clinical data environment allows quick access and extraction of information from medical charts. The purpose of this project was to evaluate 2 different methods of record linkage between the Indiana State Cancer Registry (ISCR) and INPC, determine the match rate for linkage between the ISCR and INPC data for patients diagnosed with cancer, and to assess the completeness of the ISCR based on additional validated cancer cases identified in the INPC EHRs. METHODS: Deterministic and probabilistic algorithms were applied to link ISCR cases to the INPC. The linkage results were validated by manual review and the accuracy assessed with positive predictive value (PPV). Medical charts of melanoma and lung cancer cases identified in INPC but not linked to ISCR were manually reviewed to identify true incidence cancers missed by the ISCR, from which the completeness of the ISCR was estimated for each cancer. RESULTS: Both deterministic and probabilistic approaches to linking ISCR and INPC had extremely high PPV (>99%) for identifying true matches for the overall cohort and each subcohort. The combined match rate for melanoma and lung cancer cases identified in the ISCR that matched to any patient occurrence in INPC (not by disease) was 85.5% for the complete cohort, 94.4% for melanoma, and 84.4% for lung cancer. The estimated completeness of capture by the ISCR was 84% for melanoma and 98% for lung cancer. Conclusion: Cancer registries can be successfully linked to patients’ EHR data from institutions participating in a regional health information organization (RHIO) with a high match rate. A pragmatic approach to data linkage may apply both deterministic and probabilistic approaches together for the diverse purposes of cancer control research. The RHIO has the potential to add value to the state cancer registry through the identification of additional true incident cases, but more advanced approaches, such as natural language processing, are needed.
大型自动化电子健康记录(EHR)若采用联邦数据模型整合,有潜力成为研究治疗模式和效果的有价值的基于人群的工具。印第安纳州患者护理网络(INPC)是一个独特的联邦EHR数据存储库,包含从印第安纳州各地不同医疗环境中的大量人群收集的数据。INPC临床数据环境允许快速从病历中获取和提取信息。本项目的目的是评估印第安纳州癌症登记处(ISCR)与INPC之间两种不同的记录链接方法,确定ISCR与INPC中癌症确诊患者数据的链接匹配率,并根据在INPC EHR中识别出的额外经证实的癌症病例评估ISCR的完整性。方法:应用确定性和概率性算法将ISCR病例与INPC进行链接。通过人工审核验证链接结果,并使用阳性预测值(PPV)评估准确性。对在INPC中识别出但未与ISCR链接的黑色素瘤和肺癌病例的病历进行人工审核,以识别ISCR遗漏的真正发病癌症病例,据此估算每种癌症的ISCR完整性。结果:将ISCR与INPC进行链接时,确定性和概率性方法在识别整个队列和每个亚队列的真正匹配方面均具有极高的PPV(>99%)。在ISCR中识别出的与INPC中任何患者记录(不限于疾病)匹配的黑色素瘤和肺癌病例的综合匹配率,整个队列中为85.5%,黑色素瘤为94.4%,肺癌为84.4%。ISCR的捕获估计完整性,黑色素瘤为84%,肺癌为98%。结论:癌症登记处可成功与参与区域健康信息组织(RHIO)的机构中的患者EHR数据进行链接,匹配率很高。一种务实的数据链接方法可将确定性和概率性方法结合起来用于癌症控制研究的不同目的。RHIO有潜力通过识别额外的真正发病病例为州癌症登记处增加价值,但需要更先进的方法,如自然语言处理。