VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT.
Durham VA Medical Center, Durham, NC.
JCO Clin Cancer Inform. 2023 Sep;7:e2300085. doi: 10.1200/CCI.23.00085.
Several novel therapies for castration-resistant prostate cancer (CRPC) have been approved with randomized phase III studies with continuing observational research either planned or ongoing. Accurately identifying patients with CRPC in electronic health care data is critical for quality observational research, resource allocation, and quality improvement. Previous work in this area has relied on either structured laboratory results and medication data or natural language processing (NLP) methods. However, a computable phenotype using both structured data and NLP identifies these patients with more accuracy.
The Corporate Data Warehouse (CDW) of the Veterans Health Administration (VHA) was used to collect PCa diagnoses, prostate-specific antigen test results, and information regarding patient characteristics and medication use. The final system used for validation and subsequent analysis combined the NLP system and an algorithm of structured laboratory and medication data to identify patients as being diagnosed with CRPC. Patients with both a documented diagnosis of CRPC and a documented diagnosis of metastatic PCa were classified as having mCRPC by this system.
Among 1.2 million veterans with PCa, the International Classification of Diseases (ICD)-10 diagnosis code for CRPC (Z19.2) identifies 3,791 patients from 2016 when the code was created until 2022, compared with the combined algorithm which identifies 14,103, 10,312 more than ICD-10 codes alone, from 2016 to 2022. The combined algorithm showed a sensitivity of 97.9% and a specificity of 99.2%.
ICD-10 codes proved to be insufficient for capturing CRPC in the VHA CDW data. Using both structured and unstructured data identified more than double the number of patients compared with ICD-10 codes alone. Application of this combined approach drastically improved identification of real-world patients and enables high-quality observational research in mCRPC.
已经有几种新型的前列腺癌去势抵抗治疗(CRPC)药物在随机 III 期研究中得到批准,并且计划或正在进行持续的观察性研究。在电子医疗保健数据中准确识别 CRPC 患者对于高质量的观察性研究、资源分配和质量改进至关重要。该领域的先前工作依赖于结构化的实验室结果和药物数据或自然语言处理(NLP)方法。然而,使用结构化数据和 NLP 的可计算表型可以更准确地识别这些患者。
退伍军人事务部(VA)的企业数据仓库(CDW)用于收集前列腺癌诊断、前列腺特异性抗原检测结果以及有关患者特征和药物使用的信息。最终用于验证和随后分析的系统结合了 NLP 系统和结构化实验室及药物数据算法,以识别被诊断为 CRPC 的患者。通过该系统,同时记录有 CRPC 诊断和转移性前列腺癌诊断的患者被归类为 mCRPC 患者。
在 120 万患有前列腺癌的退伍军人中,国际疾病分类(ICD)-10 编码 Z19.2 从 2016 年创建该编码时识别出 3791 例 CRPC 患者,到 2022 年,与仅使用 ICD-10 编码相比,结合算法识别出 14103 例、10312 例,多识别出 10312 例。该组合算法的敏感性为 97.9%,特异性为 99.2%。
ICD-10 编码在 VA CDW 数据中被证明不足以捕获 CRPC。与单独使用 ICD-10 编码相比,使用结构化和非结构化数据可以识别出两倍以上的患者。这种联合方法的应用极大地改善了现实世界患者的识别能力,并能够在 mCRPC 中进行高质量的观察性研究。