Suppr超能文献

使用每个谱图的候选肽进行错误发现率估计。

False discovery rate estimation using candidate peptides for each spectrum.

机构信息

Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea.

Biomedical Informatics Team, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea.

出版信息

BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.

Abstract

BACKGROUND

False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide.

RESULTS

Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14-4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1-10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001-0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05-0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013-0.274%).

CONCLUSIONS

When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide.

摘要

背景

错误发现率(FDR)估计在蛋白质组学中非常重要。目标诱饵策略(TDS)常用于 FDR 估计,它假设当谱图被错误识别时,谱图与目标或诱饵肽匹配的概率是相同的。然而,实际上没有任何谱图与目标或诱饵肽匹配的概率是相同的。我们提出了 cTDS(带有候选肽的目标诱饵策略),该策略使用谱图被错误识别为目标或诱饵肽的概率来准确估计 FDR。

结果

大多数谱图情况导致谱图被错误识别为目标或诱饵肽的概率接近 0.5,但只有大约 1.14-4.85%的总谱图具有确切的概率 0.5。我们使用捕获序列方法来证明 cTDS 的准确性。对于固定的 FDR 阈值(1-10%),cTDS 的假匹配率(FMR)比 TDS 的 FMR 更接近。我们比较了在 1% FDR 阈值下 TDS 和 cTDS 获得的肽谱匹配(PSM)数量与 HEK293 数据集。在第一和第三次重复中,cTDS 在反向、伪反向、随机化和 de Bruijn 数据库中获得的 PSM 数量超过了 TDS(约 0.001-0.132%),而伪随机化数据库中的 PSM 数量比 TDS 少(约 0.05-0.126%)。在第二次重复中,cTDS 在所有数据库中获得的 PSM 数量都超过了 TDS(约 0.013-0.274%)。

结论

当谱图实际上被错误识别时,大多数谱图与目标或诱饵肽匹配的概率并不相同。因此,我们提出了 cTDS,它使用谱图被错误识别为目标或诱饵肽的概率更准确地估计 FDR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecdc/9623924/833c9bdbb197/12859_2022_5002_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验