Suppr超能文献

美国晚期或转移性非鳞状非小细胞肺癌患者基于新一代测序的检测:使用机器学习方法的预测建模

Next-Generation Sequencing-Based Testing Among Patients With Advanced or Metastatic Nonsquamous Non-Small Cell Lung Cancer in the United States: Predictive Modeling Using Machine Learning Methods.

作者信息

Brnabic Alan James Michael, Lipkovich Ilya, Kadziola Zbigniew, He Dan, Krein Peter M, Hess Lisa M

机构信息

Eli Lilly and Company, Sydney, Australia.

Eli Lilly and Company, Indianapolis, IN, United States.

出版信息

JMIR Cancer. 2025 Jun 11;11:e64399. doi: 10.2196/64399.

Abstract

BACKGROUND

Next-generation sequencing (NGS) has become a cornerstone of treatment for lung cancer and is recommended in current treatment guidelines for patients with advanced or metastatic disease.

OBJECTIVE

This study was designed to use machine learning methods to determine demographic and clinical characteristics of patients with advanced or metastatic non-small cell lung cancer (NSCLC) that may predict likelihood of receiving NGS-based testing (ever vs never NGS-tested) as well as likelihood of timing of testing (early vs late NGS-tested).

METHODS

Deidentified patient-level data were analyzed in this study from a real-world cohort of patients with advanced or metastatic NSCLC in the United States. Patients with nonsquamous disease, who received systemic therapy for NSCLC, and had at least 3 months of follow-up data for analysis were included in this study. Three strategies, logistic regression models, penalized logistic regression using least absolute shrinkage and selection operator penalty, and extreme gradient boosting with classification trees as base learners, were used to identify predictors of ever versus never and early versus late NGS testing. Data were split into D1 (training+validation; 80%) and D2 (testing; 20%) sets; the 3 strategies were evaluated by comparing their performance on multiple m=1000 splits in the training (70%) and validation data (30%) within the D1 set. The final model was selected by evaluating performance using the area under the receiver operating curve while taking into account considerations of simplicity and clinical interpretability. Performance was re-estimated using the test data D2.

RESULTS

A total of 13,425 met the criteria for the ever NGS-tested, and 17,982 were included in the never NGS-tested group. Performance metrics showed the area under the receiver operating curve evaluated from validation data was similar across all models (77%-84%). Among those in the ever NGS-tested group, 84.08% (n=11,289) were early NGS-tested, and 15.91% (n=2136) late NGS-tested. Factors associated with both ever having NGS testing as well as early NGS testing included later year of NSCLC diagnosis, no smoking history, and evidence of programmed death ligand 1 testing (all P<.05). Factors associated with a greater chance of never receiving NGS testing included older age, lower performance status, Black race, higher number of single-gene tests, public insurance, and treatment in a geography with Molecular Diagnostics Services Program adoption (all P<.05).

CONCLUSIONS

Predictors of ever versus never as well as early versus late NGS testing in the setting of advanced or metastatic NSCLC were consistent across machine learning methods in this study, demonstrating the ability of these models to identify factors that may predict NGS-based testing. There is a need to ensure that patients regardless of age, race, insurance status, and geography (factors associated with lower odds of receiving NGS testing in this study) are provided with equitable access to NGS-based testing.

摘要

背景

下一代测序(NGS)已成为肺癌治疗的基石,并且在当前针对晚期或转移性疾病患者的治疗指南中被推荐使用。

目的

本研究旨在使用机器学习方法来确定晚期或转移性非小细胞肺癌(NSCLC)患者的人口统计学和临床特征,这些特征可能预测接受基于NGS检测的可能性(曾经接受过NGS检测与从未接受过NGS检测)以及检测时间的可能性(早期接受NGS检测与晚期接受NGS检测)。

方法

本研究分析了来自美国晚期或转移性NSCLC真实世界队列中经过去识别处理的患者层面数据。纳入本研究的患者为非鳞状疾病患者,接受过NSCLC的全身治疗,并且有至少3个月的随访数据用于分析。使用了三种策略,即逻辑回归模型、使用最小绝对收缩和选择算子惩罚的惩罚逻辑回归以及以分类树作为基础学习器的极端梯度提升,来识别曾经接受与从未接受以及早期接受与晚期接受NGS检测的预测因素。数据被分为D1(训练+验证;80%)和D2(测试;20%)集;通过比较这三种策略在D1集内训练数据(70%)和验证数据(30%)的多个m = 1000次分割上的表现来对其进行评估。通过使用受试者工作特征曲线下面积评估表现,并同时考虑简单性和临床可解释性来选择最终模型。使用测试数据D2重新估计表现。

结果

共有13425名患者符合曾经接受NGS检测的标准,17982名患者被纳入从未接受NGS检测组。表现指标显示,从验证数据评估的受试者工作特征曲线下面积在所有模型中相似(77% - 84%)。在曾经接受NGS检测的患者中,84.08%(n = 11289)为早期接受NGS检测,15.91%(n = 2136)为晚期接受NGS检测。与曾经接受NGS检测以及早期接受NGS检测相关的因素包括NSCLC诊断年份较晚、无吸烟史以及程序性死亡配体1检测的证据(所有P < 0.05)。与从未接受NGS检测可能性更高相关的因素包括年龄较大、体能状态较低、黑人种族、单基因检测数量较多、公共保险以及在采用分子诊断服务项目的地区接受治疗(所有P < 0.05)。

结论

在本研究中,晚期或转移性NSCLC患者中曾经接受与从未接受以及早期接受与晚期接受NGS检测的预测因素在机器学习方法中是一致的,这表明这些模型能够识别可能预测基于NGS检测的因素。有必要确保无论年龄、种族、保险状况和地理位置(在本研究中这些因素与接受NGS检测的可能性较低相关)的患者都能公平地获得基于NGS的检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be14/12198702/8833c7d3a38a/cancer_v11i1e64399_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验