Sada Yvonne, Hou Jason, Richardson Peter, El-Serag Hashem, Davila Jessica
*Michael E. DeBakey Veterans Administration Medical Center and Baylor College of Medicine †Health Services Research and Development Section Departments of ‡Oncology §Gastroenterology, Baylor College of Medicine, Houston, TX.
Med Care. 2016 Feb;54(2):e9-14. doi: 10.1097/MLR.0b013e3182a30373.
Accurate identification of hepatocellular cancer (HCC) cases from automated data is needed for efficient and valid quality improvement initiatives and research. We validated HCC International Classification of Diseases, 9th Revision (ICD-9) codes, and evaluated whether natural language processing by the Automated Retrieval Console (ARC) for document classification improves HCC identification.
We identified a cohort of patients with ICD-9 codes for HCC during 2005-2010 from Veterans Affairs administrative data. Pathology and radiology reports were reviewed to confirm HCC. The positive predictive value (PPV), sensitivity, and specificity of ICD-9 codes were calculated. A split validation study of pathology and radiology reports was performed to develop and validate ARC algorithms. Reports were manually classified as diagnostic of HCC or not. ARC generated document classification algorithms using the Clinical Text Analysis and Knowledge Extraction System. ARC performance was compared with manual classification. PPV, sensitivity, and specificity of ARC were calculated.
A total of 1138 patients with HCC were identified by ICD-9 codes. On the basis of manual review, 773 had HCC. The HCC ICD-9 code algorithm had a PPV of 0.67, sensitivity of 0.95, and specificity of 0.93. For a random subset of 619 patients, we identified 471 pathology reports for 323 patients and 943 radiology reports for 557 patients. The pathology ARC algorithm had PPV of 0.96, sensitivity of 0.96, and specificity of 0.97. The radiology ARC algorithm had PPV of 0.75, sensitivity of 0.94, and specificity of 0.68.
A combined approach of ICD-9 codes and natural language processing of pathology and radiology reports improves HCC case identification in automated data.
为了高效且有效地开展质量改进计划和研究,需要从自动化数据中准确识别肝细胞癌(HCC)病例。我们对HCC国际疾病分类第九版(ICD - 9)编码进行了验证,并评估了自动检索控制台(ARC)用于文档分类的自然语言处理是否能改善HCC的识别。
我们从退伍军人事务部行政数据中识别出2005 - 2010年期间具有HCC的ICD - 9编码的患者队列。对病理和放射学报告进行审查以确认HCC。计算ICD - 9编码的阳性预测值(PPV)、敏感性和特异性。对病理和放射学报告进行了一项分割验证研究,以开发和验证ARC算法。报告被人工分类为诊断为HCC或非HCC。ARC使用临床文本分析和知识提取系统生成文档分类算法。将ARC的性能与人工分类进行比较。计算ARC的PPV、敏感性和特异性。
通过ICD - 9编码共识别出1138例HCC患者。经人工审查,773例患有HCC。HCC的ICD - 9编码算法的PPV为0.67,敏感性为0.95,特异性为0.93。对于619例患者的随机子集,我们识别出323例患者的471份病理报告和557例患者的943份放射学报告。病理ARC算法的PPV为0.96,敏感性为0.96,特异性为0.97。放射学ARC算法的PPV为0.75,敏感性为0.94,特异性为0.68。
ICD - 9编码与病理和放射学报告的自然语言处理相结合的方法可改善自动化数据中HCC病例的识别。