David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1.
Bioinformatics. 2013 Jul 15;29(14):1768-75. doi: 10.1093/bioinformatics/btt274. Epub 2013 May 10.
Label-free quantification is an important approach to identify biomarkers, as it measures the quantity change of peptides across different biological samples. One of the fundamental steps for label-free quantification is to match the peptide features that are detected in two datasets to each other. Although ad hoc software tools exist for the feature matching, the definition of a combinatorial model for this problem is still not available.
A combinatorial model is proposed in this article. Each peptide feature contains a mass value and a retention time value, which are used to calculate a matching weight between a pair of features. The feature matching is to find the maximum-weighted matching between the two sets of features, after applying a to-be-computed time alignment function to all the retention time values of one set of the features. This is similar to the maximum matching problem in a bipartite graph. But we show that the requirement of time alignment makes the problem NP-hard. Practical algorithms are also provided. Experiments on real data show that the algorithm compares favorably with other existing methods.
Supplementary data are available at Bioinformatics online.
无标记定量是识别生物标志物的一种重要方法,因为它可以测量不同生物样本中肽的数量变化。无标记定量的基本步骤之一是将在两个数据集检测到的肽特征相互匹配。尽管存在用于特征匹配的特定于应用程序的软件工具,但该问题的组合模型的定义尚不可用。
本文提出了一种组合模型。每个肽特征都包含一个质量值和一个保留时间值,用于计算一对特征之间的匹配权重。特征匹配是在对一组特征的所有保留时间值应用待计算的时间对齐函数之后,找到两组特征之间的最大加权匹配。这类似于二分图中的最大匹配问题。但我们表明,时间对齐的要求使得该问题具有 NP 难。还提供了实用算法。在真实数据上的实验表明,该算法优于其他现有方法。
补充数据可在 Bioinformatics 在线获得。