Suppr超能文献

缺乏足够有力的信息特征限制了基因表达分析作为预测工具在许多临床分类问题中的潜力。

Lack of sufficiently strong informative features limits the potential of gene expression analysis as predictive tool for many clinical classification problems.

机构信息

Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, USA.

出版信息

BMC Bioinformatics. 2011 Dec 1;12:463. doi: 10.1186/1471-2105-12-463.

Abstract

BACKGROUND

Our goal was to examine how various aspects of a gene signature influence the success of developing multi-gene prediction models. We inserted gene signatures into three real data sets by altering the expression level of existing probe sets. We varied the number of probe sets perturbed (signature size), the fold increase of mean probe set expression in perturbed compared to unperturbed data (signature strength) and the number of samples perturbed. Prediction models were trained to identify which cases had been perturbed. Performance was estimated using Monte-Carlo cross validation.

RESULTS

Signature strength had the greatest influence on predictor performance. It was possible to develop almost perfect predictors with as few as 10 features if the fold difference in mean expression values were > 2 even when the spiked samples represented 10% of all samples. We also assessed the gene signature set size and strength for 9 real clinical prediction problems in six different breast cancer data sets.

CONCLUSIONS

We found sufficiently large and strong predictive signatures only for distinguishing ER-positive from ER-negative cancers, there were no strong signatures for more subtle prediction problems. Current statistical methods efficiently identify highly informative features in gene expression data if such features exist and accurate models can be built with as few as 10 highly informative features. Features can be considered highly informative if at least 2-fold expression difference exists between comparison groups but such features do not appear to be common for many clinically relevant prediction problems in human data sets.

摘要

背景

我们的目标是研究基因特征的各个方面如何影响多基因预测模型开发的成功。我们通过改变现有探针集的表达水平将基因特征插入到三个真实数据集。我们改变了扰动的探针集数量(特征大小)、与未扰动数据相比,扰动的平均探针集表达的倍数增加(特征强度)和扰动的样本数量。训练预测模型以识别哪些病例被扰动。使用蒙特卡罗交叉验证估计性能。

结果

特征强度对预测器性能的影响最大。如果平均表达值的差异倍数 > 2,即使被干扰的样本代表所有样本的 10%,也可以用多达 10 个特征来开发几乎完美的预测器。我们还评估了 9 个真实临床预测问题在 6 个不同乳腺癌数据集中的基因特征集大小和强度。

结论

我们仅在区分 ER 阳性和 ER 阴性癌症时发现了足够大且强的预测特征,对于更微妙的预测问题没有强特征。如果存在这样的特征,当前的统计方法可以有效地识别基因表达数据中的高度信息特征,并且可以使用多达 10 个高度信息特征来构建准确的模型。如果在比较组之间存在至少 2 倍的表达差异,则可以认为特征是高度信息的,但在人类数据集中,许多与临床相关的预测问题似乎并不常见。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a635/3245512/155c082a1489/1471-2105-12-463-1.jpg

相似文献

2
Distinct p53 gene signatures are needed to predict prognosis and response to chemotherapy in ER-positive and ER-negative breast cancers.
Clin Cancer Res. 2011 Apr 15;17(8):2591-601. doi: 10.1158/1078-0432.CCR-10-1045. Epub 2011 Jan 19.
6
Relative Prognostic and Predictive Value of Gene Signature and Histologic Grade in Estrogen Receptor-Positive, HER2-Negative Breast Cancer.
Clin Breast Cancer. 2016 Apr;16(2):95-100.e1. doi: 10.1016/j.clbc.2015.10.004. Epub 2015 Nov 10.
7
Prognostic value of a 92-probe signature in breast cancer.
Oncotarget. 2015 Jun 20;6(17):15662-80. doi: 10.18632/oncotarget.3525.
8
Test set bias affects reproducibility of gene signatures.
Bioinformatics. 2015 Jul 15;31(14):2318-23. doi: 10.1093/bioinformatics/btv157. Epub 2015 Mar 18.
9
Hormone receptor and ERBB2 status in gene expression profiles of human breast tumor samples.
PLoS One. 2011;6(10):e26023. doi: 10.1371/journal.pone.0026023. Epub 2011 Oct 13.
10
Effects of sample size on robustness and prediction accuracy of a prognostic gene signature.
BMC Bioinformatics. 2009 May 16;10:147. doi: 10.1186/1471-2105-10-147.

引用本文的文献

1
Improving the Prognostic Ability through Better Use of Standard Clinical Data - The Nottingham Prognostic Index as an Example.
PLoS One. 2016 Mar 3;11(3):e0149977. doi: 10.1371/journal.pone.0149977. eCollection 2016.
3
Multigene prognostic tests in breast cancer: past, present, future.
Breast Cancer Res. 2015 Jan 27;17(1):11. doi: 10.1186/s13058-015-0514-2.
4
Predictive performance of microarray gene signatures: impact of tumor heterogeneity and multiple mechanisms of drug resistance.
Cancer Res. 2014 Jun 1;74(11):2946-2961. doi: 10.1158/0008-5472.CAN-13-3375. Epub 2014 Apr 4.
5
Pharmacogenomics in bladder cancer.
Urol Oncol. 2014 Jan;32(1):16-22. doi: 10.1016/j.urolonc.2013.09.007.

本文引用的文献

3
PIK3CA mutations associated with gene signature of low mTORC1 signaling and better outcomes in estrogen receptor-positive breast cancer.
Proc Natl Acad Sci U S A. 2010 Jun 1;107(22):10208-13. doi: 10.1073/pnas.0907011107. Epub 2010 May 17.
5
Prospective comparison of clinical and genomic multivariate predictors of response to neoadjuvant chemotherapy in breast cancer.
Clin Cancer Res. 2010 Jan 15;16(2):711-8. doi: 10.1158/1078-0432.CCR-09-2247. Epub 2010 Jan 12.
6
Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.
Breast Cancer Res. 2010;12(1):R5. doi: 10.1186/bcr2468. Epub 2010 Jan 11.
7
8
Effects of sample size on robustness and prediction accuracy of a prognostic gene signature.
BMC Bioinformatics. 2009 May 16;10:147. doi: 10.1186/1471-2105-10-147.
9
Multicenter validation of a 1,550-gene expression profile for identification of tumor tissue of origin.
J Clin Oncol. 2009 May 20;27(15):2503-8. doi: 10.1200/JCO.2008.17.9762. Epub 2009 Mar 30.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验