Institut de Chimie des Substances Naturelles, CNRS UPR 2301, Université Paris-Sud, Université Paris-Saclay, Avenue de la Terrasse , 91198 Gif-sur-Yvette , France.
Anal Chem. 2018 Dec 4;90(23):13900-13908. doi: 10.1021/acs.analchem.8b03099. Epub 2018 Nov 14.
Molecular networking (MN) is becoming a standard bioinformatics tool in the metabolomic community. Its paradigm is based on the observation that compounds with a high degree of chemical similarity share comparable MS fragmentation pathways. To afford a clear separation between MS spectral clusters, only the most relevant similarity scores are selected using dedicated filtering steps requiring time-consuming parameter optimization. Depending on the filtering values selected, some scores are arbitrarily deleted and a part of the information is ignored. The problem of creating a reliable representation of MS spectra data sets can be solved using algorithms developed for dimensionality reduction and pattern recognition purposes, such as t-distributed stochastic neighbor embedding (t-SNE). This multivariate embedding method pays particular attention to local details by using nonlinear outputs to represent the entire data space. To overcome the limitations inherent to the GNPS workflow and the networking architecture, we developed MetGem. Our software allows the parallel investigation of two complementary representations of the raw data set, one based on a classic GNPS-style MN and another based on the t-SNE algorithm. The t-SNE graph preserves the interactions between related groups of spectra, while the MN output allows an unambiguous separation of clusters. Additionally, almost all parameters can be tuned in real time, and new networks can be generated within a few seconds for small data sets. With the development of this unified interface ( https://metgem.github.io ), we fulfilled the need for a dedicated, user-friendly, local software for MS comparison and spectral network generation.
分子网络(MN)正在成为代谢组学领域中一种标准的生物信息学工具。其范例基于这样一种观察结果,即具有高度化学相似性的化合物具有可比的 MS 碎片途径。为了在 MS 光谱簇之间提供清晰的分离,仅使用需要耗时参数优化的专用过滤步骤选择最相关的相似度得分。根据所选的过滤值,一些分数被任意删除,一部分信息被忽略。可以使用为降维和模式识别目的而开发的算法来解决创建 MS 光谱数据集的可靠表示的问题,例如 t 分布随机邻居嵌入(t-SNE)。这种多元嵌入方法特别关注局部细节,通过使用非线性输出来表示整个数据空间。为了克服 GNPS 工作流程和网络架构固有的局限性,我们开发了 MetGem。我们的软件允许并行研究原始数据集的两种互补表示,一种基于经典的 GNPS 风格的 MN,另一种基于 t-SNE 算法。t-SNE 图保留了相关光谱组之间的相互作用,而 MN 输出允许对簇进行明确的分离。此外,几乎所有参数都可以实时调整,对于小数据集,新网络可以在几秒钟内生成。通过开发这个统一的接口(https://metgem.github.io),我们满足了对专用、用户友好的本地 MS 比较和光谱网络生成软件的需求。