Suppr超能文献

APIR:用于肽鉴定的聚合通用蛋白质组学数据库搜索算法,同时控制 FDR。

APIR: Aggregating Universal Proteomics Database Search Algorithms for Peptide Identification with FDR Control.

机构信息

Department of Statistics and Data Science, University of California, Los Angeles, CA 90095, USA.

Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, Duarte, CA 91010, USA.

出版信息

Genomics Proteomics Bioinformatics. 2024 Jul 3;22(2). doi: 10.1093/gpbjnl/qzae042.

Abstract

Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide-spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate (FDR). To fill in this gap, we proposed a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under an FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR. Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data. The APIR R package is available at https://github.com/yiling0210/APIR.

摘要

质谱(MS)技术的进步使得对生物系统中的蛋白质组进行高通量分析成为可能。最先进的 MS 数据分析依赖于数据库搜索算法,通过识别肽谱匹配(PSM)来定量蛋白质,将质谱转换为肽序列。不同的数据库搜索算法使用不同的搜索策略,因此可能会识别出独特的 PSM。然而,目前还没有一种方法可以将所有用户指定的数据库搜索算法与保证增加鉴定肽的数量和控制假发现率(FDR)相结合。为了填补这一空白,我们提出了一个统计框架,即肽鉴定结果聚合(APIR),它与所有数据库搜索算法完全兼容。值得注意的是,在 FDR 阈值下,APIR 保证鉴定的肽数量至少与单个数据库搜索算法一样多,如果不是更多的话。在复杂蛋白质组标准数据集上的 APIR 评估表明,APIR 优于单个数据库搜索算法,并在经验上控制了 FDR。实际数据研究表明,APIR 可以鉴定出一些单个数据库搜索算法错过的与疾病相关的蛋白质和翻译后修饰。APIR 框架很容易扩展到聚合其他高通量生物医学数据分析中多个算法的发现,例如 RNA 测序数据上的差异基因表达分析。APIR R 包可在 https://github.com/yiling0210/APIR 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验