Suppr超能文献

用于带有删失生存结局的大规模基因组研究中特征选择的统一方法。

Unified methods for feature selection in large-scale genomic studies with censored survival outcomes.

机构信息

Department of Statistical Science, Temple University.

Department of Biostatistics & Bioinformatics, Fox Chase Cancer Center, Temple University Health System, Philadelphia, PA, USA.

出版信息

Bioinformatics. 2020 Jun 1;36(11):3409-3417. doi: 10.1093/bioinformatics/btaa161.

Abstract

MOTIVATION

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous datasets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. First, we develop an approach based on Kullback-Leibler information divergence and the Yang-Prentice model that includes methods for the PH and proportional odds (PO) models as special cases. Next, we propose R2 measures for the PH and PO models that can be interpreted in terms of explained randomness. Lastly, we propose a generalized pseudo-R2 index that includes PH, PO, crossing hazards and crossing odds models as special cases and can be interpreted as the percentage of separability between subjects experiencing the event and not experiencing the event according to feature measurements.

RESULTS

We evaluate the performance of our measures using extensive simulation studies and publicly available datasets in cancer genomics. We demonstrate that the proposed methods successfully address the issue of NPH in genomic feature selection and outperform existing methods.

AVAILABILITY AND IMPLEMENTATION

R code for the proposed methods is available at github.com/lburns27/Feature-Selection.

CONTACT

karthik.devarajan@fccc.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模基因组研究的主要目标之一是识别对事件时间结局具有预后影响的基因,从而深入了解疾病过程。在过去二十年中,高通量基因组技术的快速发展使科学界能够监测数以万计的基因和蛋白质的表达水平,从而产生了大量的数据集,其中基因组特征的数量远远超过了研究对象的数量。基于单变量 Cox 回归的方法通常用于选择与生存结局相关的基因组特征;然而,Cox 模型假设比例风险(PH),这对于每个特征都不太可能成立。当应用于表现出某种形式的非比例风险(NPH)的基因组特征时,这些方法可能会导致对效应的低估或高估。我们提出了一系列广泛的边缘筛选技术,通过适应各种形式的 NPH 来帮助特征排名和选择。首先,我们基于 Kullback-Leibler 信息散度和 Yang-Prentice 模型开发了一种方法,该方法包括作为特例的 PH 和比例优势(PO)模型的方法。接下来,我们提出了用于 PH 和 PO 模型的 R2 度量,可以根据随机解释来解释。最后,我们提出了一个广义伪 R2 指数,它包括 PH、PO、交叉风险和交叉优势模型作为特例,可以解释为根据特征测量,经历事件和不经历事件的受试者之间的可分离性的百分比。

结果

我们使用广泛的模拟研究和癌症基因组学中的公开数据集评估了我们的度量的性能。我们证明了所提出的方法成功地解决了基因组特征选择中的 NPH 问题,并优于现有方法。

可用性和实现

所提出方法的 R 代码可在 github.com/lburns27/Feature-Selection 上获得。

联系方式

karthik.devarajan@fccc.edu

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

8
Evaluation of hierarchical models for integrative genomic analyses.用于整合基因组分析的分层模型评估。
Bioinformatics. 2016 Mar 1;32(5):738-46. doi: 10.1093/bioinformatics/btv653. Epub 2015 Nov 5.

本文引用的文献

5
Gene expression profiling predicts the development of oral cancer.基因表达谱预测口腔癌的发生。
Cancer Prev Res (Phila). 2011 Feb;4(2):218-29. doi: 10.1158/1940-6207.CAPR-10-0155.
10
Testing for Covariate Effect in the Cox Proportional Hazards Regression Model.在Cox比例风险回归模型中检验协变量效应
Commun Stat Theory Methods. 2009 Jan 1;38(14):2333-2347. doi: 10.1080/03610920802536958.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验