Department of Computer Science, School of Science, Aalto University, Espoo, Finland.
School of Computing Science, University of Glasgow, Glasgow, UK.
Bioinformatics. 2021 Jul 19;37(12):1724-1731. doi: 10.1093/bioinformatics/btaa998.
Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2).
We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining MS2 data and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1.
Software and data are freely available at https://github.com/aalto-ics-kepaco/msms_rt_score_integration.
Supplementary data are available at Bioinformatics online.
尽管在使用质谱 (MS) 数据预测分子结构的计算方法方面已经取得了十年的快速发展,但在生物样本中识别小分子仍然是分子生物学的主要瓶颈。最近,人们越来越感兴趣地利用其他信息源,例如液相色谱 (LC) 保留时间 (RT),来改进仅基于 MS 信息(如前体质量电荷比和串联质谱 (MS2))的鉴定。
我们提出了一个概率建模框架,用于整合 LC-MS 实验中多个特征的 MS 和 RT 数据。我们将 MS 测量值和所有成对的保留顺序信息建模为马尔可夫随机场,并使用有效的近似推理来对潜在分子结构进行评分和排序。我们的实验表明,通过结合使用我们的方法的 MS2 数据和保留顺序,可以提高鉴定准确性,从而优于最先进的方法。此外,我们还证明了当除了 MS1 之外,LC-MS 特征的子集具有 MS2 测量值时,我们的模型的益处。
软件和数据可在 https://github.com/aalto-ics-kepaco/msms_rt_score_integration 上免费获得。
补充数据可在 Bioinformatics 在线获得。