使用 McMC 和模拟退火技术烘焙质谱数据派：从整合的自上而下和自下而上的数据预测蛋白质翻译后修饰。

Baking a mass-spectrometry data PIE with McMC and simulated annealing: predicting protein post-translational modifications from integrated top-down and bottom-up data.

机构信息

University of Chapel Hill, Chapel Hill, NC, USA.

出版信息

Bioinformatics. 2011 Mar 15;27(6):844-52. doi: 10.1093/bioinformatics/btr027.

DOI:10.1093/bioinformatics/btr027

PMID:21389073

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3051328/

Abstract

MOTIVATION

Post-translational modifications are vital to the function of proteins, but are hard to study, especially since several modified isoforms of a protein may be present simultaneously. Mass spectrometers are a great tool for investigating modified proteins, but the data they provide is often incomplete, ambiguous and difficult to interpret. Combining data from multiple experimental techniques-especially bottom-up and top-down mass spectrometry-provides complementary information. When integrated with background knowledge this allows a human expert to interpret what modifications are present and where on a protein they are located. However, the process is arduous and for high-throughput applications needs to be automated.

RESULTS

This article explores a data integration methodology based on Markov chain Monte Carlo and simulated annealing. Our software, the Protein Inference Engine (the PIE) applies these algorithms using a modular approach, allowing multiple types of data to be considered simultaneously and for new data types to be added as needed. Even for complicated data representing multiple modifications and several isoforms, the PIE generates accurate modification predictions, including location. When applied to experimental data collected on the L7/L12 ribosomal protein the PIE was able to make predictions consistent with manual interpretation for several different L7/L12 isoforms using a combination of bottom-up data with experimentally identified intact masses.

AVAILABILITY

Software, demo projects and source can be downloaded from http://pie.giddingslab.org/

摘要

动机

翻译后修饰对于蛋白质的功能至关重要，但很难研究，尤其是因为一种蛋白质可能同时存在几种修饰同工型。质谱仪是研究修饰蛋白的绝佳工具，但它们提供的数据通常不完整、模糊且难以解释。结合来自多种实验技术的数据——尤其是自上而下和自下而上的质谱法——提供了互补信息。当与背景知识相结合时，这允许人类专家解释哪些修饰存在以及它们在蛋白质上的位置。然而，这个过程很艰苦，对于高通量应用来说，需要自动化。

结果

本文探讨了一种基于马尔可夫链蒙特卡罗和模拟退火的数据集成方法。我们的软件，即蛋白质推理引擎（PIE）使用模块化方法应用这些算法，允许同时考虑多种类型的数据，并根据需要添加新的数据类型。即使对于代表多种修饰和几种同工型的复杂数据，PIE 也能生成准确的修饰预测，包括位置。当应用于在 L7/L12 核糖体蛋白上收集的实验数据时，PIE 能够使用自下而上的数据与实验确定的完整质量相结合，为几种不同的 L7/L12 同工型做出与手动解释一致的预测。