Suppr超能文献

XL-MS/MS 蛋白质组学中无诱饵的假发现率估计算法。

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

机构信息

Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States.

The Institute for Experiential AI, Northeastern University, Boston, MA 02115, United States.

出版信息

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

Abstract

MOTIVATION

Cross-linking tandem mass spectrometry (XL-MS/MS) is an established analytical platform used to determine distance constraints between residues within a protein or from physically interacting proteins, thus improving our understanding of protein structure and function. To aid biological discovery with XL-MS/MS, it is essential that pairs of chemically linked peptides be accurately identified, a process that requires: (i) database search, that creates a ranked list of candidate peptide pairs for each experimental spectrum and (ii) false discovery rate (FDR) estimation, that determines the probability of a false match in a group of top-ranked peptide pairs with scores above a given threshold. Currently, the only available FDR estimation mechanism in XL-MS/MS is the target-decoy approach (TDA). However, despite its simplicity, TDA has both theoretical and practical limitations that impact the estimation accuracy and increase run time over potential decoy-free approaches (DFAs).

RESULTS

We introduce a novel decoy-free framework for FDR estimation in XL-MS/MS. Our approach relies on multi-sample mixtures of skew normal distributions, where the latent components correspond to the scores of correct peptide pairs (both peptides identified correctly), partially incorrect peptide pairs (one peptide identified correctly, the other incorrectly), and incorrect peptide pairs (both peptides identified incorrectly). To learn these components, we exploit the score distributions of first- and second-ranked peptide-spectrum matches for each experimental spectrum and subsequently estimate FDR using a novel expectation-maximization algorithm with constraints. We evaluate the method on ten datasets and provide evidence that the proposed DFA is theoretically sound and a viable alternative to TDA owing to its good performance in terms of accuracy, variance of estimation, and run time.

AVAILABILITY AND IMPLEMENTATION

https://github.com/shawn-peng/xlms.

摘要

动机

交联串联质谱(XL-MS/MS)是一种成熟的分析平台,用于确定蛋白质内残基之间或物理相互作用的蛋白质之间的距离约束,从而提高我们对蛋白质结构和功能的理解。为了通过 XL-MS/MS 促进生物学发现,准确识别化学交联的肽对至关重要,这一过程需要:(i)数据库搜索,为每个实验谱创建候选肽对的排名列表,以及(ii)错误发现率(FDR)估计,确定在一组得分高于给定阈值的排名靠前的肽对中出现错误匹配的概率。目前,XL-MS/MS 中唯一可用的 FDR 估计机制是靶标-诱饵方法(TDA)。然而,尽管 TDA 简单,但它存在理论和实际的局限性,这会影响估计的准确性,并增加潜在无诱饵方法(DFA)的运行时间。

结果

我们提出了一种用于 XL-MS/MS 中 FDR 估计的新颖的无诱饵框架。我们的方法依赖于偏态正态分布的多样本混合物,其中潜在成分对应于正确肽对(两个肽都正确鉴定)、部分不正确肽对(一个肽正确鉴定,另一个不正确鉴定)和不正确肽对(两个肽都不正确鉴定)的分数。为了学习这些成分,我们利用每个实验谱的第一和第二排名肽谱匹配的分数分布,随后使用具有约束的新期望最大化算法估计 FDR。我们在十个数据集上评估了该方法,并提供了证据表明,由于其在准确性、估计方差和运行时间方面的良好表现,所提出的 DFA 在理论上是合理的,并且是 TDA 的可行替代方案。

可用性和实现

https://github.com/shawn-peng/xlms。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2cf/11256928/ab08c67fd78d/btae233f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验