United BioSource Corporation, Lexington, MA, USA.
Pharmacoepidemiol Drug Saf. 2012 May;21 Suppl 2:21-8. doi: 10.1002/pds.3247.
To develop algorithms to identify metastatic cancer in claims data, using tumor stage from an oncology electronic medical record (EMR) data warehouse as the gold standard.
Data from an outpatient oncology EMR database were linked to medical and pharmacy claims data. Patients diagnosed with breast, lung, colorectal, or prostate cancer with a stage recorded in the EMR between 2004 and 2010 and with medical claims available were eligible for the study. Separate algorithms were developed for each tumor type using variables from the claims, including diagnoses, procedures, drugs, and oncologist visits. Candidate variables were reviewed by two oncologists. For each tumor type, the selected variables were entered into a classification and regression tree model to determine the algorithm with the best combination of positive predictive value (PPV), sensitivity, and specificity.
A total of 1385 breast cancer, 1036 lung, 727 colorectal, and 267 prostate cancer patients qualified for the analysis. The algorithms varied by tumor type but typically included International Classification of Diseases-Ninth Revision codes for secondary neoplasms and use of chemotherapy and other agents typically given for metastatic disease. The final models had PPV ranging from 0.75 to 0.86, specificity 0.75-0.97, and sensitivity 0.60-0.81.
While most of these algorithms for metastatic cancer had good specificity and acceptable PPV, a tradeoff with sensitivity prevented any model from having good predictive ability on all measures. Results suggest that accurate ascertainment of metastatic status may require access to medical records or other confirmatory data sources.
开发算法以从肿瘤分期的肿瘤学电子病历(EMR)数据仓库中识别索赔数据中的转移性癌症,作为金标准。
将来自门诊肿瘤 EMR 数据库的数据与医疗和药房索赔数据相关联。符合以下条件的患者有资格参加研究:2004 年至 2010 年间在 EMR 中记录有分期且可获得医疗索赔的乳腺癌、肺癌、结直肠癌或前列腺癌患者。为每种肿瘤类型开发了单独的算法,使用索赔中的变量,包括诊断、程序、药物和肿瘤学家就诊。由两位肿瘤学家审查候选变量。对于每种肿瘤类型,选择的变量被输入分类和回归树模型,以确定具有最佳阳性预测值(PPV)、灵敏度和特异性组合的算法。
共有 1385 例乳腺癌、1036 例肺癌、727 例结直肠癌和 267 例前列腺癌患者符合分析条件。算法因肿瘤类型而异,但通常包括继发性肿瘤的国际疾病分类第九版代码以及转移性疾病通常使用的化疗和其他药物。最终模型的 PPV 范围为 0.75 至 0.86,特异性为 0.75-0.97,灵敏度为 0.60-0.81。
虽然这些转移性癌症算法中的大多数具有良好的特异性和可接受的 PPV,但与灵敏度的权衡使任何模型在所有指标上都无法具有良好的预测能力。结果表明,准确确定转移性状态可能需要访问病历或其他确认数据源。