Uszkoreit Julian, Barkovits Katalin, Pacharra Sandra, Pfeiffer Kathy, Steinbach Simone, Marcus Katrin, Eisenacher Martin
Medical Faculty, Medical Proteome Center, Ruhr University Bochum, Bochum 44801, Germany.
Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum 44801, Germany.
Data Brief. 2022 Jul 4;43:108435. doi: 10.1016/j.dib.2022.108435. eCollection 2022 Aug.
In this article, we present a data dependent acquisition (DDA) dataset which was generated as a reference and ground truth quantitative dataset. While initially used to compare samples measured with DDA and data independent acquisition (DIA) (Barkovits et al., 2020), the presented dataset holds potential value as a benchmark reference for any workflows working on DDA data. The entire dataset consists of 15 LC-MS/MS measurements composed of five distinct spike-in-states, each with three replicates. To generate the data set, a C2C12 (immortalized mouse myoblast) cell lysate was used as a complex background for five different states which were simulated by spiking 13 defined proteins at different concentrations. For this purpose, the cell lysate was used in a constant amount of 20 µg for all samples and different amounts of the 13 selected proteins ranging from 0.1 to 10 pmol were added, reflecting physiological amounts of proteins. Afterwards, all samples were tryptically digested using the same method. From each sample 200 ng tryptic peptides were measured in triplicates on a Q Exactive HF (Thermo Fisher Scientific). The mass range for MS1 was set to 350-1400 m/z with a resolution of 60,000 at 200 m/z. HCD fragmentation of the Top10 abundant precursor ions was performed at 27% NCE. The fragment analysis (MS2) was performed with a resolution of 30,000 at 200 m/z. Additionally to the raw files, the dataset contains centroided mzML files and spectrum identification results for peptide identifications performed by Mascot (Perkins et al., 1999), MS-GF+ (Kim et al., 2010) and X!Tandem (Craig and Beavis, 2004) for each separate MS analysis. The corresponding FASTA containing protein sequences as well as a combination of all identification runs performed by PIA (Uszkoreit et al., 2019, 2015) and a peptide and protein quantification performed by OpenMS (Pfeuffer et al., 2017) is included. All data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (Perez-Riverol et al., 2018) with the dataset identifier PXD012986.
在本文中,我们展示了一个数据依赖型采集(DDA)数据集,该数据集是作为参考和地面真值定量数据集生成的。虽然最初用于比较使用DDA和数据独立型采集(DIA)测量的样本(Barkovits等人,2020年),但所展示的数据集作为处理DDA数据的任何工作流程的基准参考具有潜在价值。整个数据集由15次液相色谱-串联质谱(LC-MS/MS)测量组成,包括五个不同的加标状态,每个状态有三个重复。为了生成该数据集,使用C2C12(永生化小鼠成肌细胞)细胞裂解物作为五种不同状态的复杂背景,这五种状态通过加入不同浓度的13种定义蛋白质进行模拟。为此,所有样品均使用20μg恒定数量的细胞裂解物,并加入不同量的13种选定蛋白质,范围从0.1到10皮摩尔,反映蛋白质的生理量。之后,所有样品均使用相同方法进行胰蛋白酶消化。从每个样品中取200ng胰蛋白酶肽在Q Exactive HF(赛默飞世尔科技公司)上进行三次重复测量。MS1的质量范围设置为350-1400 m/z,在200 m/z处分辨率为60,000。对丰度最高的前10个母离子进行高能碰撞解离(HCD)碎裂,归一化碰撞能量(NCE)为27%。碎片分析(MS2)在200 m/z处分辨率为30,000。除了原始文件外,该数据集还包含质心mzML文件以及由Mascot(Perkins等人,1999年)、MS-GF+(Kim等人,2010年)和X!Tandem(Craig和Beavis,2004年)对每个单独的MS分析进行肽鉴定的谱图鉴定结果。包含蛋白质序列的相应FASTA文件以及由PIA(Uszkoreit等人,2019年、2015年)执行的所有鉴定运行的组合以及由OpenMS(Pfeuffer等人,2017年)进行的肽和蛋白质定量也包含在内。所有数据已通过PRIDE合作伙伴存储库(Perez-Riverol等人,2018年)存入蛋白质组交换联盟,数据集标识符为PXD012986。