Eng Jimmy K, Deutsch Eric W
Proteomics Resource, University of Washington, Seattle, WA, 98195, USA.
Institute for Systems Biology, Seattle, WA, 98109, USA.
Proteomics. 2020 Nov;20(21-22):e1900362. doi: 10.1002/pmic.201900362. Epub 2020 Apr 2.
Protein identification by tandem mass spectrometry sequence database searching is a standard practice in many proteomics laboratories. The de facto standard for the representation of sequence databases used as input to sequence database search tools is the FASTA format. The Human Proteome Organization's Proteomics Standards Initiative has developed an extension to the FASTA format termed the proteomics standards initiative extended FASTA format or PSI extended FASTA format (PEFF) where additional information such as structural annotations are encoded in the protein description lines. Comet has been extended to automatically analyze the post translational modifications and amino acid substitutions encoded in PEFF databases. Comet's PEFF implementation and example analysis results searching a HEK293 dataset against the neXtProt PEFF database are presented.
通过串联质谱序列数据库搜索进行蛋白质鉴定是许多蛋白质组学实验室的标准做法。作为序列数据库搜索工具输入的序列数据库表示的实际标准是FASTA格式。人类蛋白质组组织的蛋白质组学标准倡议开发了FASTA格式的扩展,称为蛋白质组学标准倡议扩展FASTA格式或PSI扩展FASTA格式(PEFF),其中诸如结构注释等附加信息编码在蛋白质描述行中。Comet已得到扩展,以自动分析PEFF数据库中编码的翻译后修饰和氨基酸替换。本文展示了Comet的PEFF实现以及针对neXtProt PEFF数据库搜索HEK293数据集的示例分析结果。