Suppr超能文献

挽救被丢弃的光谱:对最小蛋白质组的全面分析

Rescuing discarded spectra: Full comprehensive analysis of a minimal proteome.

作者信息

Lluch-Senar Maria, Mancuso Francesco M, Climente-González Héctor, Peña-Paz Marcia I, Sabido Eduard, Serrano Luis

机构信息

EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain.

Universitat Pompeu Fabra (UPF), Barcelona, Spain.

出版信息

Proteomics. 2016 Feb;16(4):554-63. doi: 10.1002/pmic.201500187.

Abstract

A common problem encountered when performing large-scale MS proteome analysis is the loss of information due to the high percentage of unassigned spectra. To determine the causes behind this loss we have analyzed the proteome of one of the smallest living bacteria that can be grown axenically, Mycoplasma pneumoniae (729 ORFs). The proteome of M. pneumoniae cells, grown in defined media, was analyzed by MS. An initial search with both Mascot and a species-specific NCBInr database with common contaminants (NCBImpn), resulted in around 79% of the acquired spectra not having an assignment. The percentage of non-assigned spectra was reduced to 27% after re-analysis of the data with the PEAKS software, thereby increasing the proteome coverage of M. pneumoniae from the initial 60% to over 76%. Nonetheless, 33,413 spectra with assigned amino acid sequences could not be mapped to any NCBInr database protein sequence. Approximately, 1% of these unassigned peptides corresponded to PTMs and 4% to M. pneumoniae protein variants (deamidation and translation inaccuracies). The most abundant peptide sequence variants (Phe-Tyr and Ala-Ser) could be explained by alterations in the editing capacity of the corresponding tRNA synthases. About another 1% of the peptides not associated to any protein had repetitions of the same aromatic/hydrophobic amino acid at the N-terminus, or had Arg/Lys at the C-terminus. Thus, in a model system, we have maximized the number of assigned spectra to 73% (51,453 out of the 70,040 initial acquired spectra). All MS data have been deposited in the ProteomeXchange with identifier PXD002779 (http://proteomecentral.proteomexchange.org/dataset/PXD002779).

摘要

在进行大规模质谱蛋白质组分析时,一个常见问题是由于未分配谱图的比例较高而导致信息丢失。为了确定这种信息丢失背后的原因,我们分析了一种能够在无细胞环境中生长的最小活细菌之一——肺炎支原体(729个开放阅读框)的蛋白质组。对在限定培养基中生长的肺炎支原体细胞的蛋白质组进行了质谱分析。最初使用Mascot和包含常见污染物的物种特异性NCBInr数据库(NCBImpn)进行搜索,结果约79%的采集谱图未得到分配。使用PEAKS软件重新分析数据后,未分配谱图的比例降至27%,从而将肺炎支原体的蛋白质组覆盖率从最初的60%提高到了76%以上。尽管如此,33413个已分配氨基酸序列的谱图无法映射到任何NCBInr数据库蛋白质序列。这些未分配肽段中,约1%对应于翻译后修饰,4%对应于肺炎支原体蛋白质变体(脱酰胺和翻译不准确)。最丰富的肽序列变体(苯丙氨酸 - 酪氨酸和丙氨酸 - 丝氨酸)可以通过相应的氨酰 - tRNA合成酶编辑能力的改变来解释。另外约1%与任何蛋白质都不相关的肽段在N端具有相同芳香族/疏水氨基酸的重复序列,或者在C端具有精氨酸/赖氨酸。因此,在一个模型系统中,我们已将分配谱图的数量最大化至73%(70040个最初采集谱图中的51453个)。所有质谱数据已存入ProteomeXchange,标识符为PXD002779(http://proteomecentral.proteomexchange.org/dataset/PXD002779)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验