Dührkop Kai, Shen Huibin, Meusel Marvin, Rousu Juho, Böcker Sebastian
Chair for Bioinformatics, Friedrich Schiller University, 07743 Jena, Germany;
Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, 02150 Espoo, Finland.
Proc Natl Acad Sci U S A. 2015 Oct 13;112(41):12580-5. doi: 10.1073/pnas.1509788112. Epub 2015 Sep 21.
Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.
代谢物提供了细胞状态的直接功能特征。非靶向代谢组学实验通常依靠串联质谱来识别生物样品中的数千种化合物。如今,绝大多数代谢物仍不为人知。我们提出了一种利用小分子的串联质谱数据搜索分子结构数据库的方法。我们的方法计算出一棵能最好地解释未知分子碎裂谱的碎裂树。我们使用这棵碎裂树通过机器学习预测未知化合物的分子结构指纹。然后,这个指纹被用于搜索诸如PubChem这样的分子结构数据库。结果表明,我们的方法在计算代谢物识别的竞争方法上有显著改进。