Department of Computer Science, San Francisco State University, 1600 Holloway Avenue, San Francisco, CA 94132, USA.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2105-12-S1-S12.
Determining the disulfide (S-S) bond pattern in a protein is often crucial for understanding its structure and function. In recent research, mass spectrometry (MS) based analysis has been applied to this problem following protein digestion under both partial reduction and non-reduction conditions. However, this paradigm still awaits solutions to certain algorithmic problems fundamental amongst which is the efficient matching of an exponentially growing set of putative S-S bonded structural alternatives to the large amounts of experimental spectrometric data. Current methods circumvent this challenge primarily through simplifications, such as by assuming only the occurrence of certain ion-types (b-ions and y-ions) that predominate in the more popular dissociation methods, such as collision-induced dissociation (CID). Unfortunately, this can adversely impact the quality of results.
We present an algorithmic approach to this problem that can, with high computational efficiency, analyze multiple ions types (a, b, bo, b*, c, x, y, yo, y*, and z) and deal with complex bonding topologies, such as inter/intra bonding involving more than two peptides. The proposed approach combines an approximation algorithm-based search formulation with data driven parameter estimation. This formulation considers only those regions of the search space where the correct solution resides with a high likelihood. Putative disulfide bonds thus obtained are finally combined in a globally consistent pattern to yield the overall disulfide bonding topology of the molecule. Additionally, each bond is associated with a confidence score, which aids in interpretation and assimilation of the results.
The method was tested on nine different eukaryotic Glycosyltransferases possessing disulfide bonding topologies of varying complexity. Its performance was found to be characterized by high efficiency (in terms of time and the fraction of search space considered), sensitivity, specificity, and accuracy. The method was also compared with other techniques at the state-of-the-art. It was found to perform as well or better than the competing techniques. An implementation is available at: http://tintin.sfsu.edu/~whemurad/disulfidebond.
This research addresses some of the significant challenges in MS-based disulfide bond determination. To the best of our knowledge, this is the first algorithmic work that can consider multiple ion types in this problem setting while simultaneously ensuring polynomial time complexity and high accuracy of results.
确定蛋白质中的二硫键 (S-S) 键模式对于理解其结构和功能通常至关重要。在最近的研究中,在部分还原和非还原条件下进行蛋白质消化后,基于质谱 (MS) 的分析已被应用于该问题。然而,这种范例仍然需要解决某些算法问题,其中包括有效地将指数增长的一组假定的 S-S 键合结构替代方案与大量实验光谱数据相匹配。当前的方法主要通过简化来规避此挑战,例如假设仅发生在更流行的解离方法(如碰撞诱导解离 (CID))中占主导地位的某些离子类型(b 离子和 y 离子)。不幸的是,这会对结果的质量产生不利影响。
我们提出了一种解决此问题的算法方法,该方法可以以较高的计算效率分析多种离子类型(a、b、bo、b*、c、x、y、yo、y*和 z)并处理复杂的键合拓扑结构,例如涉及两个以上肽的内/内键合。所提出的方法结合了基于近似算法的搜索公式和数据驱动的参数估计。该公式仅考虑正确解决方案极有可能存在的搜索空间的那些区域。由此获得的假定二硫键最终以全局一致的模式组合在一起,以产生分子的整体二硫键键合拓扑结构。此外,每个键都与置信度得分相关联,这有助于解释和吸收结果。
该方法在具有不同复杂程度的九个不同的真核糖基转移酶上进行了测试。结果表明,该方法的性能具有高效率(以时间和考虑的搜索空间部分为特征)、灵敏度、特异性和准确性。该方法还与最先进的其他技术进行了比较。结果发现,它的性能与竞争技术一样好或更好。一个实现可在:http://tintin.sfsu.edu/~whemurad/disulfidebond 获得。
这项研究解决了基于 MS 的二硫键测定中的一些重大挑战。据我们所知,这是第一个可以在该问题设置中考虑多种离子类型的算法工作,同时确保多项式时间复杂度和结果的高精度。