Beyrer Julie, Nelson David R, Sheffield Kristin M, Huang Yu-Jing, Lau Yiu-Keung, Hincapie Ana L
Eli Lilly and Company, Indianapolis, IN, USA.
University of Cincinnati James L. Winkle College of Pharmacy, Cincinnati, OH, USA.
Clin Epidemiol. 2023 Jan 12;15:73-89. doi: 10.2147/CLEP.S389824. eCollection 2023.
We sought to develop and validate an incident non-small cell lung cancer (NSCLC) algorithm for United States (US) healthcare claims data. Diagnoses and procedures, but not medications, were incorporated to support longer-term relevance and reliability.
Patients with newly diagnosed NSCLC per Surveillance, Epidemiology, and End Results (SEER) served as cases. Controls included newly diagnosed small-cell lung cancer and other lung cancers, and two 5% random samples for other cancer and without cancer. Algorithms derived from logistic regression and machine learning methods used the entire sample (Approach A) or started with a previous algorithm for those with lung cancer (Approach B). Sensitivity, specificity, positive predictive values (PPV), negative predictive values, and F-scores (compared for 1000 bootstrap samples) were calculated. Misclassification was evaluated by calculating the odds of selection by the algorithm among true positives and true negatives.
The best performing algorithm utilized neural networks (Approach B). A 10-variable point-score algorithm was derived from logistic regression (Approach B); sensitivity was 77.69% and PPV = 67.61% (F-score = 72.30%). This algorithm was less sensitive for patients ≥80 years old, with Medicare follow-up time <3 months, or missing SEER data on stage, laterality, or site and less specific for patients with SEER primary site of main bronchus, SEER summary stage 2000 regional by direct extension only, or pre-index chronic pulmonary disease.
Our study developed and validated a practical, 10-variable, point-based algorithm for identifying incident NSCLC cases in a US claims database based on a previously validated incident lung cancer algorithm.
我们试图开发并验证一种针对美国医疗保健理赔数据的非小细胞肺癌(NSCLC)发病算法。纳入了诊断和手术信息,但未纳入药物信息,以确保算法具有长期相关性和可靠性。
根据监测、流行病学和最终结果(SEER)数据库中确诊为NSCLC的患者作为病例组。对照组包括新诊断的小细胞肺癌和其他肺癌患者,以及两个分别为5%的其他癌症患者随机样本和无癌症患者随机样本。从逻辑回归和机器学习方法得出的算法,使用了整个样本(方法A),或者从先前针对肺癌患者的算法开始(方法B)。计算了敏感度、特异度、阳性预测值(PPV)、阴性预测值和F值(针对1000个自助抽样样本进行比较)。通过计算算法在真阳性和真阴性中选择的概率来评估错误分类情况。
表现最佳的算法采用了神经网络(方法B)。从逻辑回归得出了一个包含10个变量的评分算法(方法B);敏感度为77.69%,PPV = 67.61%(F值 = 72.30%)。该算法对80岁及以上患者、医疗保险随访时间少于3个月的患者,或在分期、肺叶或部位方面缺少SEER数据的患者敏感度较低,而对SEER主要部位为主支气管、SEER总结分期仅为2000年区域直接扩展期,或索引前患有慢性肺病的患者特异度较低。
我们的研究基于先前验证的肺癌发病算法,开发并验证了一种实用的、包含10个变量的、基于点数的算法,用于在美国理赔数据库中识别NSCLC发病病例。