Suppr超能文献

使用基本局部比对搜索工具统计学评估质谱相似度的统计显著性的方法。

Method for assessing the statistical significance of mass spectral similarities using basic local alignment search tool statistics.

机构信息

Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University, Suita, Osaka 565-0871, Japan.

出版信息

Anal Chem. 2013 Sep 3;85(17):8291-7. doi: 10.1021/ac401564v. Epub 2013 Aug 14.

Abstract

A novel method for assessing the statistical significance of mass spectral similarities was developed using modified basic local alignment search tool (BLAST; Karlin-Altschul) statistics. In gas chromatography/mass spectrometry-based metabolomics, many signals in raw metabolome data are identified on the basis of unexpected similarities among mass spectra and the spectra of standards. Since there is inevitably noise in the observed spectra, a list of identified metabolites includes some false positives. In the developed method, electron ionization (EI) mass spectrometry-BLAST, a similarity score of two mass spectra is calculated using a general scoring scheme, from which the probability of obtaining the score by chance (P value) is calculated. For this purpose, a simple rule for converting a unit EI mass spectrum to a mass spectral sequence as well as a score matrix for aligned mass spectral sequences was developed. A Monte Carlo simulation using randomly generated mass spectral sequences demonstrated that the null distribution or the expected number of hits (E value) follows modified Karlin-Altschul statistics. A metabolite data set obtained from green tea extract was analyzed using the developed method. Among 171 metabolite signals in the metabolome data, 93 signals were identified on the basis of significant similarities (P < 0.015) with reference data. Since the expected number of false positives is 2.6, the false discovery rate was estimated to be 2.8%, indicating that the search threshold (P < 0.015) is reasonable for metabolite identification.

摘要

一种新的方法用于评估质谱相似性的统计显著性,使用改良的基本局部比对搜索工具(BLAST;Karlin-Altschul)统计数据。在基于气相色谱/质谱的代谢组学中,许多原始代谢组数据中的信号是基于质谱和标准品的光谱之间的意外相似性而被识别的。由于观察到的光谱中不可避免地存在噪声,因此识别出的代谢物列表中包含一些假阳性。在开发的方法中,电子电离(EI)质谱-BLAST,使用通用评分方案计算两个质谱的相似度得分,然后计算出获得该得分的概率(P 值)。为此,开发了一种将单位 EI 质谱转换为质谱序列的简单规则以及对齐质谱序列的评分矩阵。使用随机生成的质谱序列进行的蒙特卡罗模拟表明,零分布或预期命中数(E 值)遵循改良的 Karlin-Altschul 统计数据。使用开发的方法分析了从绿茶提取物获得的代谢物数据集。在代谢组数据中的 171 个代谢物信号中,有 93 个信号基于与参考数据的显著相似性(P <0.015)被识别。由于预期的假阳性数量为 2.6,因此假发现率估计为 2.8%,表明搜索阈值(P <0.015)对于代谢物识别是合理的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验