Suppr超能文献

用于大规模分析非同义单核苷酸变异的个性化蛋白质基因组数据库的构建与评估

Construction and assessment of individualized proteogenomic databases for large-scale analysis of nonsynonymous single nucleotide variants.

作者信息

Krug Karsten, Popic Sasa, Carpy Alejandro, Taumer Christoph, Macek Boris

机构信息

Proteome Center Tuebingen, University of Tuebingen, Germany.

出版信息

Proteomics. 2014 Dec;14(23-24):2699-708. doi: 10.1002/pmic.201400219. Epub 2014 Nov 17.

Abstract

Next-generation sequencing projects focusing on genomes and transcriptomes identify millions of single nucleotide variants (SNVs), many of which result in single amino acid substitutions. These nonsynonymous (ns) SNVs are typically not incorporated into protein sequence databases used to identify MS/MS data. Here, we perform a comparative analysis of the assembly of nsSNV-containing proteogenomic databases. We use a comprehensive transcriptome and proteome dataset of HeLa cells from the literature to derive and to incorporate SNVs into databases applicable to proteomics search engines, and to assess their performance in the identification of nsSNVs. We assemble the databases by (1) translation of SNV-containing transcripts into all possible reading frames, (2) translation of predicted reading frame, (3) prediction of nsSNVs and subsequent incorporation into canonical protein sequences. We show substantial differences between generated databases in terms of represented nsSNVs and theoretical search space, affecting sensitivity and specificity of database search. We query the databases with >2.2M high-resolution MS/MS spectra using MaxQuant software and identify 451 variant peptides, containing 401 nsSNVs. We conclude that prediction of reading frame and, if applicable, SNV effect result in comprehensive yet compact databases necessary to retain sensitivity in large-scale analysis of nsSNVs called from transcriptomics data.

摘要

专注于基因组和转录组的新一代测序项目识别出数百万个单核苷酸变异(SNV),其中许多会导致单个氨基酸替换。这些非同义(ns)SNV通常不会纳入用于识别串联质谱(MS/MS)数据的蛋白质序列数据库。在此,我们对包含nsSNV的蛋白质基因组数据库的组装进行了比较分析。我们利用文献中HeLa细胞的综合转录组和蛋白质组数据集,推导SNV并将其纳入适用于蛋白质组学搜索引擎的数据库,并评估它们在识别nsSNV方面的性能。我们通过以下方式组装数据库:(1)将包含SNV的转录本翻译成所有可能的阅读框;(2)翻译预测的阅读框;(3)预测nsSNV并随后纳入标准蛋白质序列。我们发现,生成的数据库在代表的nsSNV和理论搜索空间方面存在显著差异,这会影响数据库搜索的灵敏度和特异性。我们使用MaxQuant软件用超过220万个高分辨率MS/MS谱查询这些数据库,并鉴定出451个变异肽段,其中包含401个nsSNV。我们得出结论,阅读框预测以及(如适用)SNV效应会产生全面而紧凑的数据库,这对于在从转录组学数据中调用的nsSNV的大规模分析中保持灵敏度是必要的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验