Suppr超能文献

量化基因列表在基于微阵列的临床生物标志物中的排序稳定性。

Quantifying stability in gene list ranking across microarray derived clinical biomarkers.

机构信息

Bayer AG, Bayer Technology Services, 51368 Leverkusen, Germany.

出版信息

BMC Med Genomics. 2011 Oct 14;4:73. doi: 10.1186/1755-8794-4-73.

Abstract

BACKGROUND

Identifying stable gene lists for diagnosis, prognosis prediction, and treatment guidance of tumors remains a major challenge in cancer research. Microarrays measuring differential gene expression are widely used and should be versatile predictors of disease and other phenotypic data. However, gene expression profile studies and predictive biomarkers are often of low power, requiring numerous samples for a sound statistic, or vary between studies. Given the inconsistency of results across similar studies, methods that identify robust biomarkers from microarray data are needed to relay true biological information. Here we present a method to demonstrate that gene list stability and predictive power depends not only on the size of studies, but also on the clinical phenotype.

RESULTS

Our method projects genomic tumor expression data to a lower dimensional space representing the main variation in the data. Some information regarding the phenotype resides in this low dimensional space, while some information resides in the residuum. We then introduce an information ratio (IR) as a metric defined by the partition between projected and residual space. Upon grouping phenotypes such as tumor tissue, histological grades, relapse, or aging, we show that higher IR values correlated with phenotypes that yield less robust biomarkers whereas lower IR values showed higher transferability across studies. Our results indicate that the IR is correlated with predictive accuracy. When tested across different published datasets, the IR can identify information-rich data characterizing clinical phenotypes and stable biomarkers.

CONCLUSIONS

The IR presents a quantitative metric to estimate the information content of gene expression data with respect to particular phenotypes.

摘要

背景

鉴定用于肿瘤诊断、预后预测和治疗指导的稳定基因列表仍然是癌症研究中的一个主要挑战。测量差异基因表达的微阵列被广泛应用,并且应该是疾病和其他表型数据的多功能预测因子。然而,基因表达谱研究和预测生物标志物的功效往往较低,需要大量样本进行稳健的统计分析,或者在研究之间存在差异。鉴于类似研究的结果不一致,需要有从微阵列数据中识别稳健生物标志物的方法来传递真实的生物学信息。在这里,我们提出了一种方法来证明基因列表的稳定性和预测能力不仅取决于研究的规模,还取决于临床表型。

结果

我们的方法将基因组肿瘤表达数据投影到一个较低维度的空间中,代表数据的主要变化。一些关于表型的信息驻留在这个低维空间中,而一些信息驻留在残差中。然后,我们引入了一个信息比(IR)作为一个度量标准,由投影空间和残差空间之间的划分定义。在对肿瘤组织、组织学分级、复发或老化等表型进行分组后,我们表明,较高的 IR 值与产生不稳健生物标志物的表型相关,而较低的 IR 值则表现出较高的跨研究可转移性。我们的结果表明,IR 与预测准确性相关。当在不同的已发表数据集上进行测试时,IR 可以识别出以临床表型和稳定生物标志物为特征的富含信息的数据。

结论

IR 提供了一种定量度量标准,可以估计特定表型的基因表达数据的信息量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5b/3206838/a3c635280e8b/1755-8794-4-73-1.jpg

相似文献

1
Quantifying stability in gene list ranking across microarray derived clinical biomarkers.
BMC Med Genomics. 2011 Oct 14;4:73. doi: 10.1186/1755-8794-4-73.
3
Sensitivity analysis of gene ranking methods in phenotype prediction.
J Biomed Inform. 2016 Dec;64:255-264. doi: 10.1016/j.jbi.2016.10.012. Epub 2016 Oct 26.
4
Integrating biological knowledge with gene expression profiles for survival prediction of cancer.
J Comput Biol. 2009 Feb;16(2):265-78. doi: 10.1089/cmb.2008.12TT.
6
A probabilistic approach for automated discovery of perturbed genes using expression data from microarray or RNA-Seq.
Comput Biol Med. 2015 Dec 1;67:29-40. doi: 10.1016/j.compbiomed.2015.07.029. Epub 2015 Aug 14.
7
Diagnostic biomarkers for renal cell carcinoma: selection using novel bioinformatics systems for microarray data analysis.
Hum Pathol. 2009 Dec;40(12):1671-8. doi: 10.1016/j.humpath.2009.05.006. Epub 2009 Aug 19.
8
SplicerAV: a tool for mining microarray expression data for changes in RNA processing.
BMC Bioinformatics. 2010 Feb 25;11:108. doi: 10.1186/1471-2105-11-108.
9
Differential Coexpression Network Analysis for Gene Expression Data.
Methods Mol Biol. 2018;1754:155-165. doi: 10.1007/978-1-4939-7717-8_9.
10
Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9.

本文引用的文献

2
Dimension reduction for high-dimensional data.
Methods Mol Biol. 2010;620:417-34. doi: 10.1007/978-1-60761-580-4_14.
3
A global map of human gene expression.
Nat Biotechnol. 2010 Apr;28(4):322-4. doi: 10.1038/nbt0410-322.
5
Stability and aggregation of ranked gene lists.
Brief Bioinform. 2009 Sep;10(5):556-68. doi: 10.1093/bib/bbp034.
6
Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.
Bioinformatics. 2009 Jul 1;25(13):1662-8. doi: 10.1093/bioinformatics/btp295. Epub 2009 May 5.
7
ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression.
Nucleic Acids Res. 2009 Jan;37(Database issue):D868-72. doi: 10.1093/nar/gkn889. Epub 2008 Nov 10.
8
The humoral immune system has a key prognostic impact in node-negative breast cancer.
Cancer Res. 2008 Jul 1;68(13):5405-13. doi: 10.1158/0008-5472.CAN-07-5206.
9
[Gene expression profiling in cancer research].
Bull Cancer. 2007 Nov;94(11):976-80.
10
A stromal gene signature associated with inflammatory breast cancer.
Int J Cancer. 2008 Mar 15;122(6):1324-32. doi: 10.1002/ijc.23237.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验