Ghasemi Seyyed Mahmood, Gu Chunhui, Fahrmann Johannes F, Hanash Samir, Do Kim-Anh, Long James P, Irajizad Ehsan
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas.
Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, Texas.
Cancer Prev Res (Phila). 2025 Mar 3;18(3):117-123. doi: 10.1158/1940-6207.CAPR-24-0236.
In the cancer early detection field, logistic regression (LR) is a frequently used approach to establish a combination rule that differentiates cancer from noncancer. However, the application of LR relies on a maximum likelihood approach, which may not yield optimal combination rules for maximizing sensitivity at a clinically desirable specificity and vice versa. In this article, we have developed an improved regression framework, sensitivity maximization at a given specificity (SMAGS), for binary classification that finds the linear decision rule, yielding the maximum sensitivity for a given specificity or the maximum specificity for a given sensitivity. We additionally expand the framework for feature selection that satisfies sensitivity and specificity maximizations. We compare our SMAGS method with normal LR using two synthetic datasets and reported data for colorectal cancer from the 2018 CancerSEEK study. In the colorectal cancer CancerSEEK dataset, we report 14% improvement in sensitivity at 98.5% specificity (0.31 vs. 0.57; P value <0.05). The SMAGS method provides an alternative to LR for modeling combination rules for biomarkers and early detection applications. Prevention Relevance: This study introduces a new machine learning methodology that identifies the optimal features and combination rules to maximize sensitivity at a fixed specificity, making it applicable to many existing biomarker prevention studies.
在癌症早期检测领域,逻辑回归(LR)是一种常用的方法,用于建立区分癌症与非癌症的组合规则。然而,LR的应用依赖于最大似然法,该方法可能无法产生在临床期望的特异性下最大化灵敏度的最优组合规则,反之亦然。在本文中,我们开发了一种改进的回归框架,即给定特异性下的灵敏度最大化(SMAGS),用于二分类,该框架可找到线性决策规则,在给定特异性时产生最大灵敏度,或在给定灵敏度时产生最大特异性。我们还扩展了满足灵敏度和特异性最大化的特征选择框架。我们使用两个合成数据集以及2018年癌症早期检测研究(CancerSEEK)中结直肠癌的报告数据,将我们的SMAGS方法与普通LR进行比较。在结直肠癌CancerSEEK数据集中,我们报告在98.5%的特异性下灵敏度提高了14%(0.31对0.57;P值<0.05)。SMAGS方法为生物标志物建模组合规则和早期检测应用提供了一种替代LR的方法。预防相关性:本研究引入了一种新的机器学习方法,该方法可识别最优特征和组合规则,以在固定特异性下最大化灵敏度,使其适用于许多现有的生物标志物预防研究。