School of Computing Science, University of Glasgow, Glasgow, UK, School of Computing and Mathematical Sciences, Liverpool John Moores University, Merseyside, UK and Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, UK.
Bioinformatics. 2015 Jun 15;31(12):1999-2006. doi: 10.1093/bioinformatics/btv072. Epub 2015 Feb 2.
The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that coelute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pair-wise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result.
We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools.
The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment.
液相色谱和质谱联用(LC/MS)已广泛应用于系统生物学的大规模比较研究,包括蛋白质组学、糖组学和代谢组学。在几乎所有的实验设计中,都需要比较跨生物学或技术重复以及跨样本组的色谱图。这其中的核心是峰对齐步骤,这是最重要但最具挑战性的预处理步骤之一。现有的对齐工具没有考虑到共洗脱且源自同一代谢物或肽的相关峰之间的结构依赖性。我们提出了一种用于 LC/MS 数据的直接匹配峰对齐方法,该方法纳入了相关峰信息(在每个 LC/MS 运行中),并研究了其对对齐性能(跨运行)的影响。我们方法所需的相关峰分组可以从任何峰聚类方法获得,并构建成两两峰相似性得分函数。生成的相似性得分矩阵可由加权匹配问题的近似算法使用,以生成实际的对齐结果。
我们证明了相关峰信息可以提高对齐性能。该方法在一组基准数据集上进行了评估,与其他流行的对齐工具相比,我们的方法具有竞争力。
所提出的对齐方法已作为独立的 Python 应用程序实现,可在 http://github.com/joewandy/peak-grouping-alignment 下载。