用于带有删失生存结局的大规模基因组研究中特征选择的统一方法。

Unified methods for feature selection in large-scale genomic studies with censored survival outcomes.

机构信息

Department of Statistical Science, Temple University.

Department of Biostatistics & Bioinformatics, Fox Chase Cancer Center, Temple University Health System, Philadelphia, PA, USA.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3409-3417. doi: 10.1093/bioinformatics/btaa161.

DOI:10.1093/bioinformatics/btaa161

PMID:32154833

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7267818/

Abstract

MOTIVATION

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous datasets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. First, we develop an approach based on Kullback-Leibler information divergence and the Yang-Prentice model that includes methods for the PH and proportional odds (PO) models as special cases. Next, we propose R2 measures for the PH and PO models that can be interpreted in terms of explained randomness. Lastly, we propose a generalized pseudo-R2 index that includes PH, PO, crossing hazards and crossing odds models as special cases and can be interpreted as the percentage of separability between subjects experiencing the event and not experiencing the event according to feature measurements.

RESULTS

We evaluate the performance of our measures using extensive simulation studies and publicly available datasets in cancer genomics. We demonstrate that the proposed methods successfully address the issue of NPH in genomic feature selection and outperform existing methods.

AVAILABILITY AND IMPLEMENTATION

R code for the proposed methods is available at github.com/lburns27/Feature-Selection.

CONTACT

karthik.devarajan@fccc.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模基因组研究的主要目标之一是识别对事件时间结局具有预后影响的基因，从而深入了解疾病过程。在过去二十年中，高通量基因组技术的快速发展使科学界能够监测数以万计的基因和蛋白质的表达水平，从而产生了大量的数据集，其中基因组特征的数量远远超过了研究对象的数量。基于单变量 Cox 回归的方法通常用于选择与生存结局相关的基因组特征；然而，Cox 模型假设比例风险（PH），这对于每个特征都不太可能成立。当应用于表现出某种形式的非比例风险（NPH）的基因组特征时，这些方法可能会导致对效应的低估或高估。我们提出了一系列广泛的边缘筛选技术，通过适应各种形式的 NPH 来帮助特征排名和选择。首先，我们基于 Kullback-Leibler 信息散度和 Yang-Prentice 模型开发了一种方法，该方法包括作为特例的 PH 和比例优势（PO）模型的方法。接下来，我们提出了用于 PH 和 PO 模型的 R2 度量，可以根据随机解释来解释。最后，我们提出了一个广义伪 R2 指数，它包括 PH、PO、交叉风险和交叉优势模型作为特例，可以解释为根据特征测量，经历事件和不经历事件的受试者之间的可分离性的百分比。

结果

我们使用广泛的模拟研究和癌症基因组学中的公开数据集评估了我们的度量的性能。我们证明了所提出的方法成功地解决了基因组特征选择中的 NPH 问题，并优于现有方法。

可用性和实现

所提出方法的 R 代码可在 github.com/lburns27/Feature-Selection 上获得。

联系方式

karthik.devarajan@fccc.edu。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

Unified methods for feature selection in large-scale genomic studies with censored survival outcomes.用于带有删失生存结局的大规模基因组研究中特征选择的统一方法。

Bioinformatics. 2020 Jun 1;36(11):3409-3417. doi: 10.1093/bioinformatics/btaa161.

A pseudo-R2 measure for selecting genomic markers with crossing hazards functions.一种带有交叉风险函数的基因组标记选择的伪 R2 度量。

BMC Med Res Methodol. 2011 Mar 15;11:28. doi: 10.1186/1471-2288-11-28.

Supervised Dimension Reduction for Large-Scale "Omics" Data With Censored Survival Outcomes Under Possible Non-Proportional Hazards.带删失生存结局的可能非比例风险下大规模“组学”数据的有监督降维。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):2032-2044. doi: 10.1109/TCBB.2020.2965934. Epub 2021 Oct 7.

Modeling gene-wise dependencies improves the identification of drug response biomarkers in cancer studies.对基因层面的依赖性进行建模可改善癌症研究中药物反应生物标志物的识别。

Bioinformatics. 2017 May 1;33(9):1362-1369. doi: 10.1093/bioinformatics/btw836.

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC：一种 AUC 优化方法，用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。

Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

Identifying common prognostic factors in genomic cancer studies: a novel index for censored outcomes.识别基因组癌症研究中的常见预后因素：一种新的有删失结局指标。

BMC Bioinformatics. 2010 Mar 24;11:150. doi: 10.1186/1471-2105-11-150.

Gene selection in microarray survival studies under possibly non-proportional hazards.在可能存在非比例风险的情况下，对微阵列生存研究中的基因选择。

Bioinformatics. 2010 Mar 15;26(6):784-90. doi: 10.1093/bioinformatics/btq035. Epub 2010 Jan 29.

Evaluation of hierarchical models for integrative genomic analyses.用于整合基因组分析的分层模型评估。

Bioinformatics. 2016 Mar 1;32(5):738-46. doi: 10.1093/bioinformatics/btv653. Epub 2015 Nov 5.

The spike-and-slab lasso Cox model for survival prediction and associated genes detection.用于生存预测和相关基因检测的尖峰-平板套索 Cox 模型。

Bioinformatics. 2017 Sep 15;33(18):2799-2807. doi: 10.1093/bioinformatics/btx300.

Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.基于偏差残差的稀疏偏最小二乘和稀疏核偏最小二乘回归用于删失数据。

Bioinformatics. 2015 Feb 1;31(3):397-404. doi: 10.1093/bioinformatics/btu660. Epub 2014 Oct 6.

引用本文的文献

Elaboration Models with Symmetric Information Divergence.具有对称信息散度的精细化模型。

Int Stat Rev. 2022 Dec;90(3):499-524. doi: 10.1111/insr.12499. Epub 2022 Apr 20.

Identification of a gene expression signature associated with breast cancer survival and risk that improves clinical genomic platforms.鉴定与乳腺癌生存和风险相关的基因表达特征，该特征可改善临床基因组平台。

Bioinform Adv. 2023 Mar 22;3(1):vbad037. doi: 10.1093/bioadv/vbad037. eCollection 2023.

本文引用的文献

limma powers differential expression analyses for RNA-sequencing and microarray studies.limma为RNA测序和微阵列研究提供差异表达分析的动力。

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

A simulation study of predictive ability measures in a survival model II: explained randomness and predictive accuracy.生存模型中预测能力度量的仿真研究 II：解释随机性和预测准确性。

Stat Med. 2012 Oct 15;31(23):2644-59. doi: 10.1002/sim.5460. Epub 2012 Jul 5.

A simulation study of predictive ability measures in a survival model I: explained variation measures.一项生存模型预测能力度量的模拟研究 I：解释变异度量。

Stat Med. 2012 Oct 15;31(23):2627-43. doi: 10.1002/sim.4242. Epub 2011 Apr 26.

A pseudo-R2 measure for selecting genomic markers with crossing hazards functions.一种带有交叉风险函数的基因组标记选择的伪 R2 度量。

BMC Med Res Methodol. 2011 Mar 15;11:28. doi: 10.1186/1471-2288-11-28.

Gene expression profiling predicts the development of oral cancer.基因表达谱预测口腔癌的发生。

Cancer Prev Res (Phila). 2011 Feb;4(2):218-29. doi: 10.1158/1940-6207.CAPR-10-0155.

Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.比较微阵列分析中用于定量甲基化水平的 Beta 值法和 M 值法。

BMC Bioinformatics. 2010 Nov 30;11:587. doi: 10.1186/1471-2105-11-587.

A semi-parametric generalization of the Cox proportional hazards regression model: Inference and Applications.Cox比例风险回归模型的半参数推广：推断与应用

Comput Stat Data Anal. 2011 Jan 1;55(1):667-676. doi: 10.1016/j.csda.2010.06.010.

Identifying common prognostic factors in genomic cancer studies: a novel index for censored outcomes.识别基因组癌症研究中的常见预后因素：一种新的有删失结局指标。

BMC Bioinformatics. 2010 Mar 24;11:150. doi: 10.1186/1471-2105-11-150.

Gene selection in microarray survival studies under possibly non-proportional hazards.在可能存在非比例风险的情况下，对微阵列生存研究中的基因选择。

Bioinformatics. 2010 Mar 15;26(6):784-90. doi: 10.1093/bioinformatics/btq035. Epub 2010 Jan 29.

Testing for Covariate Effect in the Cox Proportional Hazards Regression Model.在Cox比例风险回归模型中检验协变量效应

Commun Stat Theory Methods. 2009 Jan 1;38(14):2333-2347. doi: 10.1080/03610920802536958.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验