• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于评估鸟枪法蛋白质组学中假阳性和错误发现率的诱饵方法。

Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.

作者信息

Wang Guanghui, Wu Wells W, Zhang Zheng, Masilamani Shyama, Shen Rong-Fong

机构信息

Proteomics Core Facility, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.

出版信息

Anal Chem. 2009 Jan 1;81(1):146-59. doi: 10.1021/ac801664q.

DOI:10.1021/ac801664q
PMID:19061407
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2653784/
Abstract

The potential of getting a significant number of false positives (FPs) in peptide-spectrum matches (PSMs) obtained by proteomic database search has been well-recognized. Among the attempts to assess FPs, the concomitant use of target and decoy databases is widely practiced. By adjusting filtering criteria, FPs and false discovery rate (FDR) can be controlled at a desired level. Although the target-decoy approach is gaining in popularity, subtle differences in decoy construction (e.g., reversing vs stochastic methods), rate calculation (e.g., total vs unique PSMs), or searching (separate vs composite) do exist among various implementations. In the present study, we evaluated the effects of these differences on FP and FDR estimations using a rat kidney protein sample and the SEQUEST search engine as an example. On the effects of decoy construction, we found that, when a single scoring filter (XCorr) was used, stochastic methods generated a higher estimation of FPs and FDR than sequence reversing methods, likely due to an increase in unique peptides. This higher estimation could largely be attenuated by creating decoy databases similar in effective size but not by a simple normalization with a unique-peptide coefficient. When multiple filters were applied, the differences seen between reversing and stochastic methods significantly diminished, suggesting multiple filterings reduce the dependency on how a decoy is constructed. For a fixed set of filtering criteria, FDR and FPs estimated by using unique PSMs were almost twice those using total PSMs. The higher estimation seemed to be dependent on data acquisition setup. As to the differences between performing separate or composite searches, in general, FDR estimated from the separate search was about three times that from the composite search. The degree of difference gradually decreased as the filtering criteria became more stringent. Paradoxically, the estimated true positives in separate search were higher when multiple filters were used. By analyzing a standard protein mixture, we demonstrated that the higher estimation of FDR and FPs in the separate search likely reflected an overestimation, which could be corrected with a simple merging procedure. Our study illustrates the relative merits of different implementations of the target-decoy strategy, which should be worth contemplating when large-scale proteomic biomarker discovery is to be attempted.

摘要

蛋白质组数据库搜索得到的肽段谱匹配(PSM)中出现大量假阳性(FP)的可能性已得到充分认识。在评估假阳性的尝试中,同时使用目标数据库和诱饵数据库的做法被广泛采用。通过调整过滤标准,可以将假阳性和错误发现率(FDR)控制在期望的水平。尽管目标-诱饵方法越来越受欢迎,但不同的实现方式在诱饵构建(例如,反向与随机方法)、比率计算(例如,总PSM与唯一PSM)或搜索(单独与复合)方面确实存在细微差异。在本研究中,我们以大鼠肾脏蛋白质样本和SEQUEST搜索引擎为例,评估了这些差异对FP和FDR估计的影响。关于诱饵构建的影响,我们发现,当使用单个评分过滤器(XCorr)时,随机方法产生的FP和FDR估计值高于序列反向方法,这可能是由于唯一肽段数量增加所致。通过创建有效大小相似的诱饵数据库,这种较高的估计值在很大程度上可以得到缓解,但不能通过简单地用唯一肽段系数进行归一化来实现。当应用多个过滤器时,反向和随机方法之间的差异显著减小,这表明多次过滤减少了对诱饵构建方式的依赖。对于一组固定的过滤标准,使用唯一PSM估计的FDR和FP几乎是使用总PSM估计值的两倍。这种较高的估计似乎取决于数据采集设置。至于单独搜索和复合搜索之间的差异,一般来说,单独搜索估计的FDR约为复合搜索的三倍。随着过滤标准变得更加严格,差异程度逐渐降低。矛盾的是,当使用多个过滤器时,单独搜索中估计的真阳性更高。通过分析标准蛋白质混合物,我们证明单独搜索中FDR和FP的较高估计可能反映了高估,这可以通过简单的合并程序进行校正。我们的研究说明了目标-诱饵策略不同实现方式的相对优点,在尝试大规模蛋白质组生物标志物发现时,这些优点值得考虑。

相似文献

1
Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.用于评估鸟枪法蛋白质组学中假阳性和错误发现率的诱饵方法。
Anal Chem. 2009 Jan 1;81(1):146-59. doi: 10.1021/ac801664q.
2
Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics.通用诱饵分布简化了鸟枪法蛋白质组学中的错误发现率估计
J Proteome Res. 2022 Feb 4;21(2):339-348. doi: 10.1021/acs.jproteome.1c00600. Epub 2022 Jan 6.
3
Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics.对低阶统计量进行建模以实现蛋白质组学中无诱饵的错误发现率估计。
J Proteome Res. 2023 Apr 7;22(4):1159-1171. doi: 10.1021/acs.jproteome.2c00604. Epub 2023 Mar 24.
4
A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.一种用于大规模蛋白质组学数据集中蛋白质错误发现率估计的可扩展方法。
Mol Cell Proteomics. 2015 Sep;14(9):2394-404. doi: 10.1074/mcp.M114.046995. Epub 2015 May 17.
5
Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics.用于鸟枪法蛋白质组学的改进型错误发现率估计程序
J Proteome Res. 2015 Aug 7;14(8):3148-61. doi: 10.1021/acs.jproteome.5b00081. Epub 2015 Jul 27.
6
Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach.基于目标-诱饵法的鸟枪法蛋白质组学无偏错误发现率估计
J Proteome Res. 2017 Feb 3;16(2):393-397. doi: 10.1021/acs.jproteome.6b00144. Epub 2016 Dec 13.
7
False discovery rate estimation using candidate peptides for each spectrum.使用每个谱图的候选肽进行错误发现率估计。
BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.
8
Empirical approach to false discovery rate estimation in shotgun proteomics. shotgun 蛋白质组学中错误发现率估计的经验方法
Rapid Commun Mass Spectrom. 2010 Feb;24(4):454-62. doi: 10.1002/rcm.4417.
9
New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.无诱饵的质谱蛋白质组学中假发现率估计的新混合模型。
Bioinformatics. 2020 Dec 30;36(Suppl_2):i745-i753. doi: 10.1093/bioinformatics/btaa807.
10
Large Scale Mass Spectrometry-based Identifications of Enzyme-mediated Protein Methylation Are Subject to High False Discovery Rates.基于大规模质谱法对酶介导的蛋白质甲基化进行鉴定时,假阳性率很高。
Mol Cell Proteomics. 2016 Mar;15(3):989-1006. doi: 10.1074/mcp.M115.055384. Epub 2015 Dec 23.

引用本文的文献

1
Recombinant Protein Spectral Library (rPSL) DIA-MS method improves identification and quantification of low-abundance cancer-associated and kynurenine pathway proteins.重组蛋白质光谱库(rPSL)数据非依赖采集质谱法改善了低丰度癌症相关蛋白和犬尿氨酸途径蛋白的鉴定与定量。
Commun Chem. 2025 May 10;8(1):141. doi: 10.1038/s42004-025-01531-0.
2
Characterization of a nuclear transport factor 2-like domain-containing protein in Plasmodium berghei.鉴定伯氏疟原虫中的一种核转运因子 2 样结构域蛋白。
Malar J. 2024 Jan 9;23(1):13. doi: 10.1186/s12936-024-04839-9.
3
A community resource to mass explore the wheat grain proteome and its application to the late-maturity alpha-amylase (LMA) problem.

本文引用的文献

1
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.
2
Experiment-specific estimation of peptide identification probabilities using a randomized database.使用随机数据库对肽段鉴定概率进行实验特异性估计。
OMICS. 2007 Winter;11(4):351-65. doi: 10.1089/omi.2007.0040.
3
False discovery rates and related statistical concepts in mass spectrometry-based proteomics.
一种用于大规模探索小麦谷蛋白组的社区资源及其在晚熟α-淀粉酶(LMA)问题上的应用。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad084. Epub 2023 Nov 1.
4
False discovery rate estimation using candidate peptides for each spectrum.使用每个谱图的候选肽进行错误发现率估计。
BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.
5
Metaproteomics of the human gut microbiota: Challenges and contributions to other OMICS.人类肠道微生物群的宏蛋白质组学:挑战及对其他组学的贡献
Clin Mass Spectrom. 2019 Jun 4;14 Pt A:18-30. doi: 10.1016/j.clinms.2019.06.001. eCollection 2019 Sep.
6
The association between acute fatty liver disease and nitric oxide during malaria in pregnancy.妊娠疟疾时急性脂肪性肝病与一氧化氮的关系。
Malar J. 2021 Dec 14;20(1):462. doi: 10.1186/s12936-021-03999-2.
7
Malaria in the postpartum period causes damage to the mammary gland.产褥期疟疾会对乳腺造成损害。
PLoS One. 2021 Oct 13;16(10):e0258491. doi: 10.1371/journal.pone.0258491. eCollection 2021.
8
Physical Activity Attenuates the Obesity-Induced Dysregulated Expression of Brown Adipokines in Murine Interscapular Brown Adipose Tissue.体力活动可减轻肥胖引起的小鼠肩胛间棕色脂肪组织中棕色脂肪因子表达失调。
Int J Mol Sci. 2021 Sep 27;22(19):10391. doi: 10.3390/ijms221910391.
9
Roles and Cellular Localization of GBP2 and NAB2 During the Blood Stage of Malaria Parasites.GBP2 和 NAB2 在疟原虫红内期的作用和细胞定位。
Front Cell Infect Microbiol. 2021 Sep 15;11:737457. doi: 10.3389/fcimb.2021.737457. eCollection 2021.
10
Comparison of false-discovery rates of various decoy databases.各种诱饵数据库的错误发现率比较。
Proteome Sci. 2021 Sep 18;19(1):11. doi: 10.1186/s12953-021-00179-7.
基于质谱的蛋白质组学中的错误发现率及相关统计概念。
J Proteome Res. 2008 Jan;7(1):47-50. doi: 10.1021/pr700747q. Epub 2007 Dec 8.
4
Modes of inference for evaluating the confidence of peptide identifications.用于评估肽段鉴定可信度的推断模式。
J Proteome Res. 2008 Jan;7(1):35-9. doi: 10.1021/pr7007303. Epub 2007 Dec 8.
5
Assigning significance to peptides identified by tandem mass spectrometry using decoy databases.使用诱饵数据库对通过串联质谱鉴定的肽段赋予显著性。
J Proteome Res. 2008 Jan;7(1):29-34. doi: 10.1021/pr700600n. Epub 2007 Dec 8.
6
Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy.使用目标-诱饵搜索策略对串联质谱搜索算法进行比较评估。
Mol Cell Proteomics. 2007 Sep;6(9):1599-608. doi: 10.1074/mcp.M600469-MCP200. Epub 2007 May 28.
7
Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.用于提高质谱法大规模蛋白质鉴定可信度的靶标-诱饵搜索策略。
Nat Methods. 2007 Mar;4(3):207-14. doi: 10.1038/nmeth1019.
8
Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy.使用反向和正向肽序列数据库组合策略预测大规模蛋白质组学实验中与肽鉴定的假阳性率测定相关的误差
J Proteome Res. 2007 Jan;6(1):392-8. doi: 10.1021/pr0603194.
9
A probability-based approach for high-throughput protein phosphorylation analysis and site localization.一种基于概率的高通量蛋白质磷酸化分析及位点定位方法。
Nat Biotechnol. 2006 Oct;24(10):1285-92. doi: 10.1038/nbt1240. Epub 2006 Sep 10.
10
Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry: Reproducibility, linearity, and application with complex proteomes.使用液相色谱联用离子阱或傅里叶变换质谱进行无标记蛋白质定量:重现性、线性及在复杂蛋白质组中的应用
J Proteome Res. 2006 May;5(5):1214-23. doi: 10.1021/pr050406g.