基于极高维变量的两组对比的同时精确区间估计：在质谱数据中的应用

Simultaneous and exact interval estimates for the contrast of two groups based on an extremely high dimensional variable: application to mass spec data.

作者信息

Park Yuhyun, Downing Sean R, Kim Dohyun, Hahn William C, Li Cheng, Kantoff Philip W, Wei L J

机构信息

Department of Biostatistics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.

出版信息

Bioinformatics. 2007 Jun 15;23(12):1451-8. doi: 10.1093/bioinformatics/btm130. Epub 2007 Apr 25.

DOI:10.1093/bioinformatics/btm130

PMID:17459967

Abstract

MOTIVATION

Analysis of high-throughput proteomic/genomic data, in particular, surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) data and microarray data, has led to a multitude of techniques aimed at identifying potential biomarkers. Most of the statistical techniques for comparing two groups are based on qualitative measures such as P-value. A quantitative way such as interval estimation for the contrasts of two groups is more appealing.

RESULTS

We have devised a simultaneous confidence bands method capable of detecting potential biomarkers, while controlling for overall confidence coverage level, in high-dimensional datasets that discriminate two treatment groups using a permutation scheme. For example, for the SELDI-TOF MS data, we deal with the entire spectrum simultaneously and construct (1 - alpha) confidence bands for the mean differences between groups. Furthermore, peaks were identified based on the maximal differences between the groups as determined by the confidence bands. The analysis method herein described gives both qualitative (P-value) and quantitative data (magnitude of difference). The Clinical Proteomics Programs Databank's ovarian cancer dataset and data from in-house samples containing known spiked-in proteins were analyzed. We were able to identify potential biomarkers similar to those described in previous analysis of the ovarian cancer data, however, while these markers are highly significant between cancer and normal groups, our analysis indicated the absolute difference between the two groups was minimal. In addition, we found additional markers than those previously described with greater differences in average intensities. The proposed confidence bands method successfully detected the spiked-in peaks, as well as, secondary peaks generated by adducts and double-charged species. We also illustrate our method utilizing paired gene expression data from a prostate cancer microarray experiment by constructing confidence bands for the fold changes between cancer and normal samples.

AVAILABILITY

R-package, 'seie.zip' (license: GNU GPL), is publiclly available at http://research2.dfci.harvard.edu/dfci/MS_spike-in_data/

摘要

动机

高通量蛋白质组学/基因组学数据的分析，尤其是表面增强激光解吸/电离飞行时间质谱（SELDI-TOF MS）数据和微阵列数据，已催生了众多旨在识别潜在生物标志物的技术。大多数用于比较两组的统计技术基于定性指标，如P值。一种定量方法，如两组对比的区间估计，更具吸引力。

结果

我们设计了一种同时置信带方法，该方法能够在使用置换方案区分两个治疗组的高维数据集中检测潜在生物标志物，同时控制总体置信覆盖水平。例如，对于SELDI-TOF MS数据，我们同时处理整个光谱，并构建两组之间平均差异的（1 - α）置信带。此外，根据置信带确定的组间最大差异来识别峰。本文所述的分析方法同时给出定性（P值）和定量数据（差异大小）。对临床蛋白质组学计划数据库的卵巢癌数据集以及来自含有已知加标蛋白质的内部样本的数据进行了分析。我们能够识别出与先前卵巢癌数据分析中描述的类似的潜在生物标志物，然而，虽然这些标志物在癌症组和正常组之间具有高度显著性，但我们的分析表明两组之间的绝对差异很小。此外，我们发现了比先前描述的更多的标志物，其平均强度差异更大。所提出的置信带方法成功地检测到了加标峰以及由加合物和双电荷物种产生的二级峰。我们还通过构建癌症样本和正常样本之间倍数变化的置信带，利用来自前列腺癌微阵列实验的配对基因表达数据说明了我们的方法。

可用性

R包“seie.zip”（许可证：GNU GPL）可在http://research2.dfci.harvard.edu/dfci/MS_spike-in_data/ 公开获取。

相似文献

Simultaneous and exact interval estimates for the contrast of two groups based on an extremely high dimensional variable: application to mass spec data.基于极高维变量的两组对比的同时精确区间估计：在质谱数据中的应用

Bioinformatics. 2007 Jun 15;23(12):1451-8. doi: 10.1093/bioinformatics/btm130. Epub 2007 Apr 25.

A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection.一种用于蛋白质生物标志物发现的数据分析策略：用于癌症检测的高维蛋白质组学数据剖析

Biostatistics. 2003 Jul;4(3):449-63. doi: 10.1093/biostatistics/4.3.449.

Identification of protein biomarkers in Dupuytren's contracture using surface enhanced laser desorption ionization time-of-flight mass spectrometry (SELDI-TOF-MS).使用表面增强激光解吸电离飞行时间质谱法（SELDI-TOF-MS）鉴定杜普伊特伦挛缩症中的蛋白质生物标志物。

Clin Invest Med. 2006 Jun;29(3):136-45.

Clinical proteomics in breast cancer: a review.乳腺癌中的临床蛋白质组学：综述

Breast Cancer Res Treat. 2009 Jul;116(1):17-29. doi: 10.1007/s10549-008-0263-3. Epub 2008 Dec 11.

Identification of protein biomarkers for schizophrenia and bipolar disorder in the postmortem prefrontal cortex using SELDI-TOF-MS ProteinChip profiling combined with MALDI-TOF-PSD-MS analysis.利用表面增强激光解吸电离飞行时间质谱（SELDI-TOF-MS）蛋白质芯片分析结合基质辅助激光解吸电离飞行时间后源衰变质谱（MALDI-TOF-PSD-MS）分析，在死后前额叶皮质中鉴定精神分裂症和双相情感障碍的蛋白质生物标志物。

Neurobiol Dis. 2006 Jul;23(1):61-76. doi: 10.1016/j.nbd.2006.02.002. Epub 2006 Mar 20.

Identification of biomarkers from mass spectrometry data using a "common" peak approach.使用“通用”峰方法从质谱数据中鉴定生物标志物。

BMC Bioinformatics. 2006 Jul 26;7:358. doi: 10.1186/1471-2105-7-358.

[Proteomic analysis of prostate cancer using surface enhanced laser desorption/ionization mass spectrometry].[利用表面增强激光解吸/电离质谱法对前列腺癌进行蛋白质组学分析]

Zhonghua Yi Xue Za Zhi. 2005 Nov 30;85(45):3172-5.

Using proteomic approaches to identify new biomarkers for detection and monitoring of ovarian cancer.运用蛋白质组学方法鉴定用于检测和监测卵巢癌的新型生物标志物。

Gynecol Oncol. 2006 Feb;100(2):247-53. doi: 10.1016/j.ygyno.2005.08.051. Epub 2005 Oct 17.

Surface-enhanced laser desorption/ionization time of flight mass spectrometry protein profiling identifies ubiquitin and ferritin light chain as prognostic biomarkers in node-negative breast cancer tumors.表面增强激光解吸/电离飞行时间质谱蛋白质谱分析确定泛素和铁蛋白轻链为淋巴结阴性乳腺癌肿瘤的预后生物标志物。

Proteomics. 2006 Mar;6(6):1963-75. doi: 10.1002/pmic.200500283.

SELDI-TOF-MS of saliva: methodology and pre-treatment effects.唾液的表面增强激光解吸电离飞行时间质谱分析：方法及预处理效果

J Chromatogr B Analyt Technol Biomed Life Sci. 2007 Feb 15;847(1):45-53. doi: 10.1016/j.jchromb.2006.10.005. Epub 2006 Oct 27.

引用本文的文献

Inverse set estimation and inversion of simultaneous confidence intervals.反向集估计与同时置信区间的反演

J R Stat Soc Ser C Appl Stat. 2024 May 31;73(4):1082-1109. doi: 10.1093/jrsssc/qlae027. eCollection 2024 Aug.

Combination antiangiogenic therapy in advanced breast cancer: a phase 1 trial of vandetanib, a VEGFR inhibitor, and metronomic chemotherapy, with correlative platelet proteomics.晚期乳腺癌的联合抗血管生成治疗：VEGFR 抑制剂凡德他尼联合节拍化疗的 1 期试验，并进行血小板蛋白质组学相关性分析。

Breast Cancer Res Treat. 2012 Nov;136(1):169-78. doi: 10.1007/s10549-012-2256-5. Epub 2012 Sep 23.

A comparative study on proteomics between LNCap and DU145 cells by quantitative detection and SELDI analysis.通过定量检测和表面增强激光解吸电离飞行时间质谱分析对LNCap细胞和DU145细胞进行蛋白质组学比较研究。

J Huazhong Univ Sci Technolog Med Sci. 2008 Apr;28(2):174-8. doi: 10.1007/s11596-008-0215-5. Epub 2008 May 15.

Significance analysis of microarray for relative quantitation of LC/MS data in proteomics.蛋白质组学中用于液相色谱/质谱数据相对定量的微阵列显著性分析。

BMC Bioinformatics. 2008 Apr 10;9:187. doi: 10.1186/1471-2105-9-187.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于极高维变量的两组对比的同时精确区间估计：在质谱数据中的应用

Simultaneous and exact interval estimates for the contrast of two groups based on an extremely high dimensional variable: application to mass spec data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献