Kroenke Kurt, Ruddy Kathryn J, Pachman Deirdre R, Grzegorczyk Veronica, Herrin Jeph, Rahman Parvez A, Tobin Kyle A, Griffin Joan M, Chlan Linda L, Austin Jessica D, Ridgeway Jennifer L, Mitchell Sandra A, Marsolo Keith A, Cheville Andrea L
Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, United States.
Regenstrief Institute, Inc., Indianapolis, Indiana, United States.
Appl Clin Inform. 2025 May;16(3):556-568. doi: 10.1055/a-2544-3117. Epub 2025 Jun 18.
The Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial is a pragmatic trial testing a collaborative care approach for managing common cancer symptoms. There were challenges in identifying cancer site and metastatic status.This study compares three different approaches to determine cancer site and six strategies for identifying the presence of metastasis using EHR and cancer registry data.The E2C2 cohort included 50,559 patients seen in the medical oncology clinics of a large health system. SPPADE symptoms were assessed with 0 to 10 numeric rating scales (NRS). A multistep process was used to develop three approaches for representing cancer site: the single most prevalent International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10) code, the two most prevalent codes, and any diagnostic code. Six approaches for identifying metastatic disease were compared: ICD-10 codes, natural language processing (NLP), cancer registry, medications typically prescribed for incurable disease, treatment plan, and evaluation for phase 1 trials.The approach counting the two most prevalent ICD-10 cancer site diagnoses per patient detected a median of 92% of the cases identified by counting all cancer site diagnoses, whereas the approach counting only the single most prevalent cancer site diagnosis identified a median of 65%. However, agreement among the three approaches was very good (kappa > 0.80) for most cancer sites. ICD and NLP methods could be applied to the entire cohort and had the highest agreement (kappa = 0.53) for identifying metastasis. Cancer registry data was available for less than half of the patients.Identification of cancer site and metastatic disease using EHR data was feasible in this large and diverse cohort of patients with common cancer symptoms. The methods were pragmatic and may be acceptable for covariates, but likely require refinement for key dependent and independent variables.
增强型电子健康记录促进癌症症状控制(E2C2)试验是一项实用性试验,旨在测试一种用于管理常见癌症症状的协作护理方法。在确定癌症部位和转移状态方面存在挑战。本研究比较了三种不同的确定癌症部位的方法以及六种使用电子健康记录和癌症登记数据识别转移情况的策略。E2C2队列包括在一个大型医疗系统的肿瘤内科诊所就诊的50559名患者。使用0至10的数字评定量表(NRS)评估SPPADE症状。采用多步骤过程来制定三种表示癌症部位的方法:最常见的单个国际疾病分类及相关健康问题第十次修订版(ICD-10)代码、两个最常见的代码以及任何诊断代码。比较了六种识别转移性疾病的方法:ICD-10代码、自然语言处理(NLP)、癌症登记、通常用于不治之症的药物、治疗计划以及1期试验评估。计算每位患者两个最常见的ICD-10癌症部位诊断的方法检测到的病例中位数为通过计算所有癌症部位诊断所识别病例的92%,而仅计算最常见的单个癌症部位诊断的方法检测到的病例中位数为65%。然而,对于大多数癌症部位,这三种方法之间的一致性非常好(kappa>0.80)。ICD和NLP方法可应用于整个队列,并且在识别转移方面具有最高的一致性(kappa = 0.53)。不到一半的患者有癌症登记数据。在这个患有常见癌症症状的庞大且多样化的患者队列中,使用电子健康记录数据识别癌症部位和转移性疾病是可行的。这些方法是实用的,对于协变量可能是可接受的,但可能需要对关键的因变量和自变量进行完善。