Suppr超能文献

基于极高维变量的两组对比的同时精确区间估计:在质谱数据中的应用

Simultaneous and exact interval estimates for the contrast of two groups based on an extremely high dimensional variable: application to mass spec data.

作者信息

Park Yuhyun, Downing Sean R, Kim Dohyun, Hahn William C, Li Cheng, Kantoff Philip W, Wei L J

机构信息

Department of Biostatistics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.

出版信息

Bioinformatics. 2007 Jun 15;23(12):1451-8. doi: 10.1093/bioinformatics/btm130. Epub 2007 Apr 25.

Abstract

MOTIVATION

Analysis of high-throughput proteomic/genomic data, in particular, surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) data and microarray data, has led to a multitude of techniques aimed at identifying potential biomarkers. Most of the statistical techniques for comparing two groups are based on qualitative measures such as P-value. A quantitative way such as interval estimation for the contrasts of two groups is more appealing.

RESULTS

We have devised a simultaneous confidence bands method capable of detecting potential biomarkers, while controlling for overall confidence coverage level, in high-dimensional datasets that discriminate two treatment groups using a permutation scheme. For example, for the SELDI-TOF MS data, we deal with the entire spectrum simultaneously and construct (1 - alpha) confidence bands for the mean differences between groups. Furthermore, peaks were identified based on the maximal differences between the groups as determined by the confidence bands. The analysis method herein described gives both qualitative (P-value) and quantitative data (magnitude of difference). The Clinical Proteomics Programs Databank's ovarian cancer dataset and data from in-house samples containing known spiked-in proteins were analyzed. We were able to identify potential biomarkers similar to those described in previous analysis of the ovarian cancer data, however, while these markers are highly significant between cancer and normal groups, our analysis indicated the absolute difference between the two groups was minimal. In addition, we found additional markers than those previously described with greater differences in average intensities. The proposed confidence bands method successfully detected the spiked-in peaks, as well as, secondary peaks generated by adducts and double-charged species. We also illustrate our method utilizing paired gene expression data from a prostate cancer microarray experiment by constructing confidence bands for the fold changes between cancer and normal samples.

AVAILABILITY

R-package, 'seie.zip' (license: GNU GPL), is publiclly available at http://research2.dfci.harvard.edu/dfci/MS_spike-in_data/

摘要

动机

高通量蛋白质组学/基因组学数据的分析,尤其是表面增强激光解吸/电离飞行时间质谱(SELDI-TOF MS)数据和微阵列数据,已催生了众多旨在识别潜在生物标志物的技术。大多数用于比较两组的统计技术基于定性指标,如P值。一种定量方法,如两组对比的区间估计,更具吸引力。

结果

我们设计了一种同时置信带方法,该方法能够在使用置换方案区分两个治疗组的高维数据集中检测潜在生物标志物,同时控制总体置信覆盖水平。例如,对于SELDI-TOF MS数据,我们同时处理整个光谱,并构建两组之间平均差异的(1 - α)置信带。此外,根据置信带确定的组间最大差异来识别峰。本文所述的分析方法同时给出定性(P值)和定量数据(差异大小)。对临床蛋白质组学计划数据库的卵巢癌数据集以及来自含有已知加标蛋白质的内部样本的数据进行了分析。我们能够识别出与先前卵巢癌数据分析中描述的类似的潜在生物标志物,然而,虽然这些标志物在癌症组和正常组之间具有高度显著性,但我们的分析表明两组之间的绝对差异很小。此外,我们发现了比先前描述的更多的标志物,其平均强度差异更大。所提出的置信带方法成功地检测到了加标峰以及由加合物和双电荷物种产生的二级峰。我们还通过构建癌症样本和正常样本之间倍数变化的置信带,利用来自前列腺癌微阵列实验的配对基因表达数据说明了我们的方法。

可用性

R包“seie.zip”(许可证:GNU GPL)可在http://research2.dfci.harvard.edu/dfci/MS_spike-in_data/ 公开获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验