利用双重抽样从高通量基因组数据中稳健选择癌症生存特征

Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.

作者信息

Lee Sangkyun, Rahnenführer Jörg, Lang Michel, De Preter Katleen, Mestdagh Pieter, Koster Jan, Versteeg Rogier, Stallings Raymond L, Varesio Luigi, Asgharzadeh Shahab, Schulte Johannes H, Fielitz Kathrin, Schwermer Melanie, Morik Katharina, Schramm Alexander

机构信息

Department of Computer Sciences, TU Dortmund University, Dortmund, Germany.

Department of Statistics, TU Dortmund University, Dortmund, Germany.

出版信息

PLoS One. 2014 Oct 8;9(10):e108818. doi: 10.1371/journal.pone.0108818. eCollection 2014.

DOI:10.1371/journal.pone.0108818

PMID:25295525

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4190101/

Abstract

Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org).

摘要

识别与临床患者预后相关的特征是高通量研究中的一项基本任务。由mRNA、miRNA、SNP或其他分子变量等特征组成的特征集，即使它们是从考虑相同类型疾病样本的类似实验中识别出来的，通常也不重叠。缺乏一致性主要是因为样本量远小于要考虑的候选特征数量，因此特征选择存在很大差异。我们提出了一种稳健的特征选择方法，该方法增强了用于预测生存风险的惩罚回归算法的选择稳定性。我们的方法基于对通过应用于给定队列数据的随机（内部）子样本的预处理套索算法获得的多个可能不稳定的特征集进行聚合，其中聚合后的特征集通过简单的阈值策略进行收缩。由此产生的方法RS-PL在概念上简单且易于应用，依赖于通过交叉验证自动调整的参数。使用RS-PL进行稳健的特征选择在（外部）子采样框架内运行，以估计RS-PL多次试验中特征的选择概率。这些概率用于识别要包含在特征集中的可靠特征。我们的方法在神经母细胞瘤、肺腺癌和乳腺癌患者的微阵列数据集上进行了评估，提取了用于预测生存风险的稳健且相关的特征集。我们的方法获得的特征集在三个数据集上始终具有很高的预测性能和稳健性。我们稳健特征集中具有高选择概率的基因已被报道与癌症相关。与特征集相关的预测系数的排序在RS-PL的多次试验中得到了很好的保留，证明了我们的方法能够识别可转移的共识特征集。该软件可作为R包rsig在CRAN（http://cran.r-project.org）上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e693/4190101/25fd5f3b04e9/pone.0108818.g001.jpg

相似文献

Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.

PLoS One. 2014 Oct 8;9(10):e108818. doi: 10.1371/journal.pone.0108818. eCollection 2014.

Robust estimation of the expected survival probabilities from high-dimensional Cox models with biomarker-by-treatment interactions in randomized clinical trials.

BMC Med Res Methodol. 2017 May 22;17(1):83. doi: 10.1186/s12874-017-0354-0.

A network module-based method for identifying cancer prognostic signatures.

Genome Biol. 2012 Dec 10;13(12):R112. doi: 10.1186/gb-2012-13-12-r112.

Design of a multi-signature ensemble classifier predicting neuroblastoma patients' outcome.

BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S13. doi: 10.1186/1471-2105-13-S4-S13.

Accurate outcome prediction in neuroblastoma across independent data sets using a multigene signature.

Clin Cancer Res. 2010 Mar 1;16(5):1532-41. doi: 10.1158/1078-0432.CCR-09-2607. Epub 2010 Feb 23.

Construction and optimization of gene expression signatures for prediction of survival in two-arm clinical trials.

BMC Bioinformatics. 2020 Jul 25;21(1):333. doi: 10.1186/s12859-020-03655-7.

Boosting the discriminatory power of sparse survival models via optimization of the concordance index and stability selection.

BMC Bioinformatics. 2016 Jul 22;17:288. doi: 10.1186/s12859-016-1149-8.

HiFreSP: A novel high-frequency sub-pathway mining approach to identify robust prognostic gene signatures.

Brief Bioinform. 2020 Jul 15;21(4):1411-1424. doi: 10.1093/bib/bbz078.

Model selection for prognostic time-to-event gene signature discovery with applications in early breast cancer data.

Stat Appl Genet Mol Biol. 2013 Oct 1;12(5):619-35. doi: 10.1515/sagmb-2012-0047.

Maximizing biomarker discovery by minimizing gene signatures.

BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2164-12-S5-S6.

引用本文的文献

DECO: decompose heterogeneous population cohorts for patient stratification and discovery of sample biomarkers using omic data profiling.

Bioinformatics. 2019 Oct 1;35(19):3651-3662. doi: 10.1093/bioinformatics/btz148.

A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.

Comput Math Methods Med. 2017;2017:7907163. doi: 10.1155/2017/7907163. Epub 2017 Aug 1.

Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention.

BMC Med Inform Decis Mak. 2017 Aug 3;17(1):115. doi: 10.1186/s12911-017-0508-3.

High Dimensional Variable Selection with Error Control.

Biomed Res Int. 2016;2016:8209453. doi: 10.1155/2016/8209453. Epub 2016 Aug 15.

A Unique Primer with an Inosine Chain at the 5'-Terminus Improves the Reliability of SNP Analysis Using the PCR-Amplified Product Length Polymorphism Method.

PLoS One. 2015 Sep 18;10(9):e0136995. doi: 10.1371/journal.pone.0136995. eCollection 2015.

本文引用的文献

Ion channel gene expression in lung adenocarcinoma: potential role in prognosis and diagnosis.

PLoS One. 2014 Jan 23;9(1):e86569. doi: 10.1371/journal.pone.0086569. eCollection 2014.

Genomic and transcriptional alterations in lung adenocarcinoma in relation to EGFR and KRAS mutation status.

PLoS One. 2013 Oct 24;8(10):e78614. doi: 10.1371/journal.pone.0078614. eCollection 2013.

Genome-wide association analysis with gray matter volume as a quantitative phenotype in first-episode treatment-naïve patients with schizophrenia.

PLoS One. 2013 Sep 24;8(9):e75083. doi: 10.1371/journal.pone.0075083. eCollection 2013.

Hereditary breast cancer: ever more pieces to the polygenic puzzle.

Hered Cancer Clin Pract. 2013 Sep 11;11(1):12. doi: 10.1186/1897-4287-11-12.

Essential regulation of lung surfactant homeostasis by the orphan G protein-coupled receptor GPR116.

Cell Rep. 2013 May 30;3(5):1457-64. doi: 10.1016/j.celrep.2013.04.019. Epub 2013 May 16.

Clinicopathological significance of reduced SPARCL1 expression in human breast cancer.

Asian Pac J Cancer Prev. 2013;14(1):195-200. doi: 10.7314/apjcp.2013.14.1.195.

Drafting the CLN3 protein interactome in SH-SY5Y human neuroblastoma cells: a label-free quantitative proteomics approach.

J Proteome Res. 2013 May 3;12(5):2101-15. doi: 10.1021/pr301125k. Epub 2013 Apr 19.

Model averaging strategies for structure learning in Bayesian networks with limited data.

BMC Bioinformatics. 2012;13 Suppl 13(Suppl 13):S10. doi: 10.1186/1471-2105-13-S13-S10. Epub 2012 Aug 24.

Transcriptional program of ciliated epithelial cells reveals new cilium and centrosome components and links to human disease.

PLoS One. 2012;7(12):e52166. doi: 10.1371/journal.pone.0052166. Epub 2012 Dec 31.

The role of MMP-1 in breast cancer growth and metastasis to the brain in a xenograft model.

BMC Cancer. 2012 Dec 7;12:583. doi: 10.1186/1471-2407-12-583.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用双重抽样从高通量基因组数据中稳健选择癌症生存特征

Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献