Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany.
Bioinformatics. 2018 Jul 1;34(13):i333-i340. doi: 10.1093/bioinformatics/bty245.
Metabolites, small molecules that are involved in cellular reactions, provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem mass spectrometry to identify the thousands of compounds in a biological sample. Recently, we presented CSI:FingerID for searching in molecular structure databases using tandem mass spectrometry data. CSI:FingerID predicts a molecular fingerprint that encodes the structure of the query compound, then uses this to search a molecular structure database such as PubChem. Scoring of the predicted query fingerprint and deterministic target fingerprints is carried out assuming independence between the molecular properties constituting the fingerprint.
We present a scoring that takes into account dependencies between molecular properties. As before, we predict posterior probabilities of molecular properties using machine learning. Dependencies between molecular properties are modeled as a Bayesian tree network; the tree structure is estimated on the fly from the instance data. For each edge, we also estimate the expected covariance between the two random variables. For fixed marginal probabilities, we then estimate conditional probabilities using the known covariance. Now, the corrected posterior probability of each candidate can be computed, and candidates are ranked by this score. Modeling dependencies improves identification rates of CSI:FingerID by 2.85 percentage points.
The new scoring Bayesian (fixed tree) is integrated into SIRIUS 4.0 (https://bio.informatik.uni-jena.de/software/sirius/).
代谢物是参与细胞反应的小分子,提供了细胞状态的直接功能特征。非靶向代谢组学实验通常依赖串联质谱来鉴定生物样本中的数千种化合物。最近,我们提出了 CSI:FingerID,用于使用串联质谱数据搜索分子结构数据库。CSI:FingerID 预测一个分子指纹,该指纹编码查询化合物的结构,然后使用该指纹在 PubChem 等分子结构数据库中进行搜索。假设指纹中的分子性质之间相互独立,对预测查询指纹和确定性目标指纹进行评分。
我们提出了一种考虑分子性质之间相关性的评分方法。与之前一样,我们使用机器学习来预测分子性质的后验概率。将分子性质之间的相关性建模为贝叶斯树网络;树结构是从实例数据中实时估计的。对于每条边,我们还估计两个随机变量之间的预期协方差。对于固定的边际概率,然后使用已知的协方差来估计条件概率。现在,可以计算每个候选者的修正后验概率,并根据该得分对候选者进行排名。通过建模相关性,CSI:FingerID 的识别率提高了 2.85 个百分点。
新的评分贝叶斯(固定树)已集成到 SIRIUS 4.0(https://bio.informatik.uni-jena.de/software/sirius/)中。