Division of Biomedical Engineering and Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.
Bioinformatics. 2013 Oct 1;29(19):2469-76. doi: 10.1093/bioinformatics/btt435. Epub 2013 Jul 30.
Liquid chromatography coupled to mass spectrometry (LC-MS) is the dominant technological platform for proteomics. An LC-MS analysis of a complex biological sample can be visualized as a 'map' of which the positional coordinates are the mass-to-charge ratio (m/z) and chromatographic retention time (RT) of the chemical species profiled. Label-free quantitative proteomics requires the alignment and comparison of multiple LC-MS maps to ascertain the reproducibility of experiments or reveal proteome changes under different conditions. The main challenge in this task lies in correcting inevitable RT shifts. Similar, but not identical, LC instruments and settings can cause peptides to elute at very different times and sometimes in a different order, violating the assumptions of many state-of-the-art alignment tools. To meet this challenge, we developed LWBMatch, a new algorithm based on weighted bipartite matching. Unlike existing tools, which search for accurate warping functions to correct RT shifts, we directly seek a peak-to-peak mapping by maximizing a global similarity function between two LC-MS maps. For alignment tasks with large RT shifts (>500 s), an approximate warping function is determined by locally weighted scatterplot smoothing of potential matched features, detected using a novel voting scheme based on co-elution. For validation, we defined the ground truth for alignment success based on tandem mass spectrometry identifications from sequence searching. We showed that our method outperforms several existing tools in terms of precision and recall, and is capable of aligning maps from different instruments and settings.
Available at https://sourceforge.net/projects/rt-alignment/.
液相色谱与质谱联用(LC-MS)是蛋白质组学的主要技术平台。对复杂生物样本的 LC-MS 分析可以看作是一张“图谱”,其位置坐标是所分析化学物质的质荷比(m/z)和色谱保留时间(RT)。无标记定量蛋白质组学需要对齐和比较多个 LC-MS 图谱,以确定实验的可重复性或揭示不同条件下的蛋白质组变化。这项任务的主要挑战在于纠正不可避免的 RT 偏移。类似但不完全相同的 LC 仪器和设置会导致肽以非常不同的时间洗脱,有时甚至以不同的顺序洗脱,从而违反了许多最先进的对齐工具的假设。为了应对这一挑战,我们开发了 LWBMatch,这是一种基于加权二分匹配的新算法。与现有的寻找准确扭曲函数来纠正 RT 偏移的工具不同,我们通过最大化两个 LC-MS 图谱之间的全局相似性函数,直接寻求峰到峰的映射。对于 RT 偏移较大(>500s)的对齐任务,通过潜在匹配特征的局部加权散点平滑确定近似扭曲函数,使用基于共洗脱的新投票方案检测到这些特征。为了验证,我们根据基于序列搜索的串联质谱鉴定定义了对齐成功的基准。我们表明,我们的方法在精度和召回率方面优于几种现有工具,并且能够对齐来自不同仪器和设置的图谱。