Beyrer Julie, Nelson David R, Sheffield Kristin M, Huang Yu-Jing, Ellington Tim, Hincapie Ana L
Eli Lilly and Company, Indianapolis, Indiana, USA.
DeLisle Associates LTD, Indianapolis, Indiana, USA.
Pharmacoepidemiol Drug Saf. 2020 Nov;29(11):1465-1479. doi: 10.1002/pds.5137. Epub 2020 Oct 4.
Our aim was to develop and validate a practical US healthcare claims algorithm for identifying incident lung cancer that improves on positive predictive value (PPV) and sensitivity observed in past studies.
Patients newly diagnosed with lung cancer in Surveillance, Epidemiology, and End Results (SEER) (gold standard) were linked with Medicare claims. A 5% Medicare "other cancer" sample and noncancer sample served as controls. A split-sample validation approach was used. Rules-based, regression, and machine learning models for developing algorithms were explored. Algorithms were developed in the model building subset. Rules-based algorithms and those with the highest F scores were evaluated in the validation subset. F scores were compared for 1000 bootstrap samples. Misclassification was evaluated by calculating the odds of selection by the algorithm among true positives and true negatives.
A practical single-score algorithm derived from a logistic regression model had sensitivity = 78.22% and PPV = 78.50% (F score: 78.36). The algorithm was most likely to misclassify older patients (ages ≥80 years) or with missing data in the SEER registry, shorter follow-up time in Medicare (<3 months), insurance through Veterans Affairs, >1 cancer in SEER, or certain Charlson comorbidities (dementia, chronic pulmonary disease, liver disease, or myocardial infarction).
In this dataset, a practical point-based algorithm for identifying incident lung cancer demonstrated significant and substantial improvement (7.9% and 23.9% absolute improvement in sensitivity and PPV, respectively) compared with a current standard.
我们的目标是开发并验证一种实用的美国医疗保健理赔算法,用于识别初发肺癌,该算法在阳性预测值(PPV)和敏感性方面优于以往研究中的表现。
将监测、流行病学和最终结果(SEER)(金标准)中新诊断为肺癌的患者与医疗保险理赔数据相链接。选取5%的医疗保险“其他癌症”样本和非癌症样本作为对照。采用拆分样本验证方法。探索了基于规则、回归和机器学习模型来开发算法。在模型构建子集中开发算法。在验证子集中评估基于规则的算法和F分数最高的算法。对1000个自助抽样样本的F分数进行比较。通过计算算法在真阳性和真阴性中选择的几率来评估错误分类情况。
从逻辑回归模型得出的实用单分数算法的敏感性 = 78.22%,PPV = 78.50%(F分数:78.36)。该算法最有可能对年龄较大(≥80岁)、SEER登记中数据缺失、医疗保险随访时间较短(<3个月)、通过退伍军人事务部参保、SEER中有>1种癌症或患有某些查尔森合并症(痴呆、慢性肺病、肝病或心肌梗死)的患者进行错误分类。
在该数据集中,与当前标准相比,一种用于识别初发肺癌的实用基于点数的算法显示出显著且实质性的改善(敏感性和PPV分别有7.9%和23.9%的绝对改善)。