一种用于二元分类的给定特异性下的新型灵敏度最大化方法。

A Novel Sensitivity Maximization at a Given Specificity Method for Binary Classifications.

作者信息

Ghasemi Seyyed Mahmood, Gu Chunhui, Fahrmann Johannes F, Hanash Samir, Do Kim-Anh, Long James P, Irajizad Ehsan

机构信息

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas.

Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, Texas.

出版信息

Cancer Prev Res (Phila). 2025 Mar 3;18(3):117-123. doi: 10.1158/1940-6207.CAPR-24-0236.

DOI:10.1158/1940-6207.CAPR-24-0236

PMID:39618306

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11875929/

Abstract

In the cancer early detection field, logistic regression (LR) is a frequently used approach to establish a combination rule that differentiates cancer from noncancer. However, the application of LR relies on a maximum likelihood approach, which may not yield optimal combination rules for maximizing sensitivity at a clinically desirable specificity and vice versa. In this article, we have developed an improved regression framework, sensitivity maximization at a given specificity (SMAGS), for binary classification that finds the linear decision rule, yielding the maximum sensitivity for a given specificity or the maximum specificity for a given sensitivity. We additionally expand the framework for feature selection that satisfies sensitivity and specificity maximizations. We compare our SMAGS method with normal LR using two synthetic datasets and reported data for colorectal cancer from the 2018 CancerSEEK study. In the colorectal cancer CancerSEEK dataset, we report 14% improvement in sensitivity at 98.5% specificity (0.31 vs. 0.57; P value <0.05). The SMAGS method provides an alternative to LR for modeling combination rules for biomarkers and early detection applications. Prevention Relevance: This study introduces a new machine learning methodology that identifies the optimal features and combination rules to maximize sensitivity at a fixed specificity, making it applicable to many existing biomarker prevention studies.

摘要

在癌症早期检测领域，逻辑回归（LR）是一种常用的方法，用于建立区分癌症与非癌症的组合规则。然而，LR的应用依赖于最大似然法，该方法可能无法产生在临床期望的特异性下最大化灵敏度的最优组合规则，反之亦然。在本文中，我们开发了一种改进的回归框架，即给定特异性下的灵敏度最大化（SMAGS），用于二分类，该框架可找到线性决策规则，在给定特异性时产生最大灵敏度，或在给定灵敏度时产生最大特异性。我们还扩展了满足灵敏度和特异性最大化的特征选择框架。我们使用两个合成数据集以及2018年癌症早期检测研究（CancerSEEK）中结直肠癌的报告数据，将我们的SMAGS方法与普通LR进行比较。在结直肠癌CancerSEEK数据集中，我们报告在98.5%的特异性下灵敏度提高了14%（0.31对0.57；P值<0.05）。SMAGS方法为生物标志物建模组合规则和早期检测应用提供了一种替代LR的方法。预防相关性：本研究引入了一种新的机器学习方法，该方法可识别最优特征和组合规则，以在固定特异性下最大化灵敏度，使其适用于许多现有的生物标志物预防研究。

相似文献

A Novel Sensitivity Maximization at a Given Specificity Method for Binary Classifications.一种用于二元分类的给定特异性下的新型灵敏度最大化方法。

Cancer Prev Res (Phila). 2025 Mar 3;18(3):117-123. doi: 10.1158/1940-6207.CAPR-24-0236.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Diagnostic test accuracy of nutritional tools used to identify undernutrition in patients with colorectal cancer: a systematic review.用于识别结直肠癌患者营养不良的营养评估工具的诊断测试准确性：一项系统综述

JBI Database System Rev Implement Rep. 2015 May 15;13(4):141-87. doi: 10.11124/jbisrir-2015-1673.

Plasma and cerebrospinal fluid amyloid beta for the diagnosis of Alzheimer's disease dementia and other dementias in people with mild cognitive impairment (MCI).血浆和脑脊液β淀粉样蛋白用于诊断轻度认知障碍（MCI）患者的阿尔茨海默病性痴呆及其他痴呆。

Cochrane Database Syst Rev. 2014 Jun 10;2014(6):CD008782. doi: 10.1002/14651858.CD008782.pub4.

Guaiac-based faecal occult blood tests versus faecal immunochemical tests for colorectal cancer screening in average-risk individuals.基于愈创木脂的粪便潜血试验与粪便免疫化学试验用于一般风险人群结直肠癌筛查。

Cochrane Database Syst Rev. 2022 Jun 6;6(6):CD009276. doi: 10.1002/14651858.CD009276.pub2.

Liver fibrosis stage based on the four factors (FIB-4) score or Forns index in adults with chronic hepatitis C.基于四项因素（FIB-4）评分或 Forns 指数的成人慢性丙型肝炎肝纤维化分期。

Cochrane Database Syst Rev. 2024 Aug 13;8(8):CD011929. doi: 10.1002/14651858.CD011929.pub2.

123I-MIBG scintigraphy and 18F-FDG-PET imaging for diagnosing neuroblastoma.用于诊断神经母细胞瘤的123I-间碘苄胍闪烁扫描术和18F-氟代脱氧葡萄糖正电子发射断层显像

Cochrane Database Syst Rev. 2015 Sep 29;2015(9):CD009263. doi: 10.1002/14651858.CD009263.pub2.

Transient elastography for diagnosis of stages of hepatic fibrosis and cirrhosis in people with alcoholic liver disease.瞬时弹性成像技术用于诊断酒精性肝病患者的肝纤维化和肝硬化分期。

Cochrane Database Syst Rev. 2015 Jan 22;1(1):CD010542. doi: 10.1002/14651858.CD010542.pub2.

Systematic review and validation of prediction rules for identifying children with serious infections in emergency departments and urgent-access primary care.系统评价和验证预测规则，以识别急诊科和紧急初级保健中严重感染的儿童。

Health Technol Assess. 2012;16(15):1-100. doi: 10.3310/hta16150.

本文引用的文献

Evaluation of cell-free DNA approaches for multi-cancer early detection.用于多癌早期检测的游离DNA方法评估。

Cancer Cell. 2022 Dec 12;40(12):1537-1549.e12. doi: 10.1016/j.ccell.2022.10.022. Epub 2022 Nov 17.

Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set.使用独立验证集对靶向甲基化的多癌种早期检测测试进行临床验证。

Ann Oncol. 2021 Sep;32(9):1167-1177. doi: 10.1016/j.annonc.2021.05.806. Epub 2021 Jun 24.

Machine learning prediction in cardiovascular diseases: a meta-analysis.机器学习在心血管疾病中的预测：一项荟萃分析。

Sci Rep. 2020 Sep 29;10(1):16057. doi: 10.1038/s41598-020-72685-1.

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数（MCC）在二分类评估中优于 F1 得分和准确率的优势。

BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.

Detection and localization of surgically resectable cancers with a multi-analyte blood test.通过多分析物血液检测对外科可切除癌症进行检测和定位。

Science. 2018 Feb 23;359(6378):926-930. doi: 10.1126/science.aar3247. Epub 2018 Jan 18.

Multitarget stool DNA testing for colorectal-cancer screening.多靶点粪便 DNA 检测用于结直肠癌筛查。

N Engl J Med. 2014 Apr 3;370(14):1287-97. doi: 10.1056/NEJMoa1311194. Epub 2014 Mar 19.

A boosting method for maximizing the partial area under the ROC curve.最大化 ROC 曲线下偏面积的一种提升方法。

BMC Bioinformatics. 2010 Jun 10;11:314. doi: 10.1186/1471-2105-11-314.

Use and misuse of the receiver operating characteristic curve in risk prediction.风险预测中受试者工作特征曲线的应用与误用

Circulation. 2007 Feb 20;115(7):928-35. doi: 10.1161/CIRCULATIONAHA.106.672402.

Assessing the accuracy of prediction algorithms for classification: an overview.评估分类预测算法的准确性：综述

Bioinformatics. 2000 May;16(5):412-24. doi: 10.1093/bioinformatics/16.5.412.

Exact logistic regression: theory and examples.精确逻辑回归：理论与实例

Stat Med. 1995 Oct 15;14(19):2143-60. doi: 10.1002/sim.4780141908.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。