Department of Biology, University of North Dakota, Grand Forks, ND, 58202, USA.
Departments of Structural Biology and Developmental Neurobiology, Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
Nat Commun. 2022 Feb 8;13(1):744. doi: 10.1038/s41467-022-28411-8.
The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP , and a web-based version can be accessed at https://smap.shinyapps.io/smap/ .
基因组学和蛋白质组学数据的整合(蛋白质基因组学)有望进一步深入了解人类疾病。然而,由于样品处理的复杂性,样品混淆是蛋白质基因组学中普遍存在的问题。在这里,我们提出了一种用于蛋白质基因组学样品匹配(SMAP)的管道,以验证样品身份并确保数据完整性。SMAP 从定量质谱(MS)推断出依赖于样品的蛋白质编码变体,并通过两个判别分数将基于 MS 的蛋白质组学样品与基因组样品对齐。使用模拟数据进行的理论分析表明,当个体样品的≥20%基因型可用时,SMAP 能够唯一地匹配蛋白质组学和基因组学样品。当 SMAP 应用于由 PsychENCODE BrainGVEX 项目生成的大型数据集时,54 个样品(19%)被纠正。通过来自同一组样品的核糖体分析和染色质测序(ATAC-seq)数据进一步证实了该纠正。我们的结果表明,SMAP 是大规模基于 MS 的蛋白质基因组学研究中样品验证的有效工具。SMAP 可在 https://github.com/UND-Wanglab/SMAP 上公开获取,并且可以在 https://smap.shinyapps.io/smap/ 上访问基于网络的版本。