面向预测的标记选择（PROMISE）：及其在高维回归中的应用

Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.

作者信息

Kim Soyeon, Baladandayuthapani Veerabhadran, Lee J Jack

机构信息

Department of Statistics, Rice University, Houston, TX, USA.

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.

出版信息

Stat Biosci. 2017 Jun;9(1):217-245. doi: 10.1007/s12561-016-9169-5. Epub 2016 Sep 26.

DOI:10.1007/s12561-016-9169-5

PMID:28785367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5543994/

Abstract

In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.

摘要

在个性化医疗中，生物标志物用于根据个体患者的生物标志物/基因组概况选择最有可能成功的治疗方法。两个目标是选择能够准确预测治疗结果的重要生物标志物，并剔除不重要的生物标志物以降低生物学和临床验证的成本。由于基因组数据的高维度性，这些目标具有挑战性。基于惩罚回归的变量选择方法（例如套索回归和弹性网络）已取得了有前景的结果。然而，选择合适的惩罚量对于同时实现这两个目标至关重要。基于交叉验证（CV）的标准方法通常能提供高预测准确性和高真阳性率，但代价是出现过多的假阳性。另外，稳定性选择（SS）控制了假阳性的数量，但代价是真阳性数量过少。为了规避这些问题，我们提出了面向预测的标记选择（PROMISE），它将稳定性选择与交叉验证相结合，融合了两种方法的优点。我们将PROMISE与套索回归和弹性网络应用于数据分析表明，与交叉验证相比，PROMISE产生稀疏解、假阳性少、I型 + II型错误小，并保持良好的预测准确性，真阳性率略有下降。与稳定性选择相比，PROMISE提供了更好的预测准确性和真阳性率。总之，当目标是最小化假阳性并最大化预测准确性时，PROMISE可应用于许多领域来选择正则化参数。

相似文献

Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.面向预测的标记选择（PROMISE）：及其在高维回归中的应用

Stat Biosci. 2017 Jun;9(1):217-245. doi: 10.1007/s12561-016-9169-5. Epub 2016 Sep 26.

Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events.使用惩罚回归识别高血压患者的临床相关特征：心血管事件的案例研究。

Med Biol Eng Comput. 2019 Sep;57(9):2011-2026. doi: 10.1007/s11517-019-02007-9. Epub 2019 Jul 25.

Variable selection in social-environmental data: sparse regression and tree ensemble machine learning approaches.社会环境数据中的变量选择：稀疏回归和树集成机器学习方法。

BMC Med Res Methodol. 2020 Dec 10;20(1):302. doi: 10.1186/s12874-020-01183-9.

Repeated Sieving for Prediction Model Building with High-Dimensional Data.用于高维数据预测模型构建的重复筛选

J Pers Med. 2024 Jul 19;14(7):769. doi: 10.3390/jpm14070769.

Evaluation of the lasso and the elastic net in genome-wide association studies.全基因组关联研究中lasso 和弹性网络的评估。

Front Genet. 2013 Dec 4;4:270. doi: 10.3389/fgene.2013.00270. eCollection 2013.

Regularized group regression methods for genomic prediction: Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD.用于基因组预测的正则化组回归方法：桥回归、最小角回归、平滑截断绝对偏差、组桥回归、组套索回归、稀疏组套索回归、组最小角回归和组平滑截断绝对偏差。

BMC Proc. 2014 Oct 7;8(Suppl 5):S7. doi: 10.1186/1753-6561-8-S5-S7. eCollection 2014.

Robust estimation of the expected survival probabilities from high-dimensional Cox models with biomarker-by-treatment interactions in randomized clinical trials.在随机临床试验中，通过生物标志物与治疗的相互作用，从高维Cox模型中稳健估计预期生存概率。

BMC Med Res Methodol. 2017 May 22;17(1):83. doi: 10.1186/s12874-017-0354-0.

Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models.在高维惩罚 Cox 回归模型中考虑分组预测变量或途径。

BMC Bioinformatics. 2020 Jul 2;21(1):277. doi: 10.1186/s12859-020-03618-y.

Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models.套索惩罚的经验性扩展以降低高维Cox回归模型中的错误发现率

Stat Med. 2016 Jul 10;35(15):2561-73. doi: 10.1002/sim.6927. Epub 2016 Mar 10.

Benefits of dimension reduction in penalized regression methods for high-dimensional grouped data: a case study in low sample size.降维在高维分组数据惩罚回归方法中的优势：小样本量的案例研究。

Bioinformatics. 2019 Oct 1;35(19):3628-3634. doi: 10.1093/bioinformatics/btz135.

引用本文的文献

Prediction-oriented prognostic biomarker discovery with survival machine learning methods.运用生存机器学习方法进行面向预测的预后生物标志物发现。

NAR Genom Bioinform. 2023 Jun 16;5(2):lqad055. doi: 10.1093/nargab/lqad055. eCollection 2023 Jun.

Statistical and Machine Learning Methods for Discovering Prognostic Biomarkers for Survival Outcomes.用于发现生存结局预后生物标志物的统计和机器学习方法。

Methods Mol Biol. 2023;2629:11-21. doi: 10.1007/978-1-0716-2986-4_2.

Measuring individual benefits of psychiatric treatment using longitudinal binary outcomes: Application to antipsychotic benefits in non-cannabis and cannabis users.使用纵向二分类结局衡量精神科治疗的个体获益：非大麻和大麻使用者抗精神病药物获益的应用。

J Biopharm Stat. 2020 Sep 2;30(5):916-940. doi: 10.1080/10543406.2020.1765371. Epub 2020 Jun 8.

Collective effects of long-range DNA methylations predict gene expressions and estimate phenotypes in cancer.长程 DNA 甲基化的集体效应可预测癌症中的基因表达并估计表型。

Sci Rep. 2020 Mar 3;10(1):3920. doi: 10.1038/s41598-020-60845-2.

Germinal Immunogenetics predict treatment outcome for PD-1/PD-L1 checkpoint inhibitors.生殖免疫遗传学预测 PD-1/PD-L1 检查点抑制剂的治疗效果。

Invest New Drugs. 2020 Feb;38(1):160-171. doi: 10.1007/s10637-019-00845-w. Epub 2019 Aug 11.

3' UTR shortening represses tumor-suppressor genes in trans by disrupting ceRNA crosstalk.3'UTR 缩短通过破坏 ceRNA 串扰来抑制反式肿瘤抑制基因。

Nat Genet. 2018 Jun;50(6):783-789. doi: 10.1038/s41588-018-0118-8. Epub 2018 May 21.

本文引用的文献

Clinical, pathological and molecular prognostic factors in prostate cancer decision-making process.

Urologia. 2016 Jan-Mar;83(1):14-20. doi: 10.5301/uro.5000166. Epub 2016 Feb 24.

Adaptive clinical trial designs in oncology.肿瘤学中的适应性临床试验设计。

Chin Clin Oncol. 2014 Dec;3(4). doi: 10.3978/j.issn.2304-3865.2014.06.04.

Wnt signaling pathway in non-small cell lung cancer.Wnt 信号通路在非小细胞肺癌中的作用。

J Natl Cancer Inst. 2014 Jan;106(1):djt356. doi: 10.1093/jnci/djt356. Epub 2013 Dec 5.

Bayesian two-step Lasso strategy for biomarker selection in personalized medicine development for time-to-event endpoints.贝叶斯两步 Lasso 策略在时间事件终点个体化医学开发中的生物标志物选择。

Contemp Clin Trials. 2013 Nov;36(2):642-50. doi: 10.1016/j.cct.2013.09.009. Epub 2013 Sep 25.

Companion diagnostic testing for targeted cancer therapies: an overview.靶向癌症治疗的伴随诊断检测：概述

Genet Test Mol Biomarkers. 2013 Jul;17(7):515-23. doi: 10.1089/gtmb.2012.0510. Epub 2013 Apr 10.

Aryl hydrocarbon receptor and lung cancer.芳香烃受体与肺癌。

Anticancer Res. 2013 Apr;33(4):1247-56.

Incorporating group correlations in genome-wide association studies using smoothed group Lasso.使用平滑群组 Lasso 在全基因组关联研究中纳入群组相关性。

Biostatistics. 2013 Apr;14(2):205-19. doi: 10.1093/biostatistics/kxs034. Epub 2012 Sep 17.

The BATTLE trial: personalizing therapy for lung cancer.BATTLE 试验：为肺癌患者实施个体化治疗。

Cancer Discov. 2011 Jun;1(1):44-53. doi: 10.1158/2159-8274.CD-10-0010. Epub 2011 Jun 1.

Stability selection for genome-wide association.全基因组关联的稳定性选择。

Genet Epidemiol. 2011 Nov;35(7):722-8. doi: 10.1002/gepi.20623. Epub 2011 Aug 26.

Predictive and prognostic molecular markers for cancer medicine.癌症医学的预测和预后分子标志物。

Ther Adv Med Oncol. 2010 Mar;2(2):125-48. doi: 10.1177/1758834009360519.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验