Interdisciplinary Center for Scientific Computing, University of Heidelberg, Heidelberg, Germany.
Bioinformatics. 2010 Jun 15;26(12):1535-41. doi: 10.1093/bioinformatics/btq165. Epub 2010 May 3.
Time-resolved hydrogen exchange (HX) followed by mass spectrometry (MS) is a key technology for studying protein structure, dynamics and interactions. HX experiments deliver a time-dependent distribution of deuteration levels of peptide sequences of the protein of interest. The robust and complete estimation of this distribution for as many peptide fragments as possible is instrumental to understanding dynamic protein-level HX behavior. Currently, this data interpretation step still is a bottleneck in the overall HX/MS workflow.
We propose HeXicon, a novel algorithmic workflow for automatic deuteration distribution estimation at increased sequence coverage. Based on an L(1)-regularized feature extraction routine, HeXicon extracts the full deuteration distribution, which allows insight into possible bimodal exchange behavior of proteins, rather than just an average deuteration for each time point. Further, it is capable of addressing ill-posed estimation problems, yielding sparse and physically reasonable results. HeXicon makes use of existing peptide sequence information, which is augmented by an inferred list of peptide candidates derived from a known protein sequence. In conjunction with a supervised classification procedure that balances sensitivity and specificity, HeXicon can deliver results with increased sequence coverage.
The entire HeXicon workflow has been implemented in C++ and includes a graphical user interface. It is available at http://hci.iwr.uni-heidelberg.de/software.php.
Supplementary data are available at Bioinformatics online.
时间分辨的氢交换 (HX) 结合质谱 (MS) 是研究蛋白质结构、动态和相互作用的关键技术。HX 实验提供了感兴趣蛋白质的肽序列的氘化水平随时间的分布。尽可能完整地估计这种分布对于理解动态蛋白质水平 HX 行为至关重要。目前,这一数据解释步骤仍然是整个 HX/MS 工作流程的瓶颈。
我们提出了 HeXicon,这是一种用于在增加序列覆盖度的情况下自动估计氘化分布的新算法工作流程。基于 L(1)正则化特征提取例程,HeXicon 提取了完整的氘化分布,这使得人们能够洞察蛋白质可能的双峰交换行为,而不仅仅是每个时间点的平均氘化。此外,它能够解决不适定的估计问题,产生稀疏且符合物理规律的结果。HeXicon 利用现有的肽序列信息,并通过推断出已知蛋白质序列的肽候选列表来扩充该信息。与平衡敏感性和特异性的监督分类过程相结合,HeXicon 可以在增加序列覆盖度的情况下提供结果。
整个 HeXicon 工作流程已在 C++中实现,并包括一个图形用户界面。它可在 http://hci.iwr.uni-heidelberg.de/software.php 获得。
补充数据可在 Bioinformatics 在线获得。