Earth and Biological Sciences Directorate , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States.
Department of Environmental Science , University of Arizona , Tucson 85712 , United States.
J Chem Inf Model. 2019 Sep 23;59(9):4052-4060. doi: 10.1021/acs.jcim.9b00444. Epub 2019 Aug 20.
The current gold standard for unambiguous molecular identification in metabolomics analysis is comparing two or more orthogonal properties from the analysis of authentic reference materials (standards) to experimental data acquired in the same laboratory with the same analytical methods. This represents a significant limitation for comprehensive chemical identification of small molecules in complex samples. The process is time consuming and costly, and the majority of molecules are not yet represented by standards. Thus, there is a need to assemble evidence for the presence of small molecules in complex samples through the use of libraries containing calculated chemical properties. To address this need, we developed a Multi-Attribute Matching Engine (MAME) and a library derived in part from our chemical library engine (ISiCLE). Here, we describe an initial evaluation of these methods in a blinded analysis of synthetic chemical mixtures as part of the U.S. Environmental Protection Agency's (EPA) Non-Targeted Analysis Collaborative Trial (ENTACT, Phase 1). For molecules in all mixtures, the initial blinded false negative rate (FNR), false discovery rate (FDR), and accuracy were 57%, 77%, and 91%, respectively. For high evidence scores, the FDR was 35%. After unblinding of the sample compositions, we optimized the scoring parameters to better exploit the available evidence and increased the accuracy for molecules suspected as present. The final FNR, FDR, and accuracy were 67%, 53%, and 96%, respectively. For high evidence scores, the FDR was 10%. This study demonstrates that multiattribute matching methods in conjunction with libraries may one day enable reduced reliance on experimentally derived libraries for building evidence for the presence of molecules in complex samples.
目前代谢组学分析中明确的分子鉴定的金标准是将真实参考物质(标准)的两种或多种正交性质与同一实验室相同分析方法获得的实验数据进行比较。这对于复杂样品中小分子的综合化学鉴定是一个重大限制。该过程既耗时又昂贵,而且大多数分子尚未用标准品表示。因此,需要通过使用包含计算化学性质的文库来为复杂样品中小分子的存在提供证据。为了满足这一需求,我们开发了多属性匹配引擎(MAME)和一个部分来源于我们的化学库引擎(ISiCLE)的文库。在这里,我们描述了这些方法在作为美国环境保护署(EPA)非靶向分析协作试验(ENTACT,第 1 阶段)的一部分的合成化学混合物的盲法分析中的初步评估。对于所有混合物中的分子,初始盲法假阴性率(FNR)、假发现率(FDR)和准确率分别为 57%、77%和 91%。对于高置信度分数,FDR 为 35%。在样品组成被揭示后,我们优化了评分参数,以更好地利用可用证据,并提高了对疑似存在的分子的准确性。最终的 FNR、FDR 和准确率分别为 67%、53%和 96%。对于高置信度分数,FDR 为 10%。这项研究表明,多属性匹配方法结合文库有朝一日可能会减少对基于实验的文库的依赖,从而为复杂样品中分子的存在提供证据。