Rudnick Paul A, Markey Sanford P, Roth Jeri, Mirokhin Yuri, Yan Xinjian, Tchekhovskoi Dmitrii V, Edwards Nathan J, Thangudu Ratna R, Ketchum Karen A, Kinsinger Christopher R, Mesri Mehdi, Rodriguez Henry, Stein Stephen E
Spectragen Informatics, Bainbridge Island, Washington 98110, United States.
Biomolecular Measurement Division, National Institute of Standards and Technology , Gaithersburg, Maryland 20899, United States.
J Proteome Res. 2016 Mar 4;15(3):1023-32. doi: 10.1021/acs.jproteome.5b01091. Epub 2016 Feb 25.
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics data sets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and nonreference markers of cancer. The CPTAC laboratories have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these data sets were produced from 2D liquid chromatography-tandem mass spectrometry analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false-discovery rate-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the data sets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level ("rolled-up") precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data to enable comparisons between different samples and cancer types as well as across the major omics fields.
临床蛋白质组肿瘤分析联盟(CPTAC)通过对先前由癌症基因组图谱(TCGA)项目分析过的肿瘤样本进行质谱分析,生成了大量蛋白质组学数据集。基因组和蛋白质组数据的可用性使得对癌症的参考标记(即主要序列数据库中包含的标记)和非参考标记进行蛋白质基因组学研究成为可能。CPTAC实验室在第一轮分析中重点关注结肠、乳腺和卵巢组织;这些数据集的光谱是通过二维液相色谱-串联质谱分析产生的,具有深度覆盖。为了减少不同数据分析平台(如软件包、版本、参数、序列数据库等)引入的变异性,创建了CPTAC通用数据分析平台(CDAP)。CDAP生成肽谱匹配(PSM)报告和基因水平报告。该流程根据以下步骤处理原始质谱数据:(1)峰检测和定量数据提取,(2)数据库搜索,(3)基于基因的蛋白质简约分析,以及(4)基于错误发现率的过滤。该流程还使用PhosphoRS程序为磷酸肽富集研究生成定位分数。每个数据集的定量信息特定于样本处理,PSM和蛋白质报告包含光谱水平或基因水平(“汇总”)的前体峰面积以及无标记或4重iTRAQ报告离子对数比的光谱计数。报告以简单的制表符分隔格式提供,对于PSM报告,以mzIdentML格式提供。CDAP的目标是为所有CPTAC数据提供标准、统一的报告,以便能够在不同样本和癌症类型之间以及跨主要组学领域进行比较。