Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States.
The Institute for Experiential AI, Northeastern University, Boston, MA 02115, United States.
Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.
Cross-linking tandem mass spectrometry (XL-MS/MS) is an established analytical platform used to determine distance constraints between residues within a protein or from physically interacting proteins, thus improving our understanding of protein structure and function. To aid biological discovery with XL-MS/MS, it is essential that pairs of chemically linked peptides be accurately identified, a process that requires: (i) database search, that creates a ranked list of candidate peptide pairs for each experimental spectrum and (ii) false discovery rate (FDR) estimation, that determines the probability of a false match in a group of top-ranked peptide pairs with scores above a given threshold. Currently, the only available FDR estimation mechanism in XL-MS/MS is the target-decoy approach (TDA). However, despite its simplicity, TDA has both theoretical and practical limitations that impact the estimation accuracy and increase run time over potential decoy-free approaches (DFAs).
We introduce a novel decoy-free framework for FDR estimation in XL-MS/MS. Our approach relies on multi-sample mixtures of skew normal distributions, where the latent components correspond to the scores of correct peptide pairs (both peptides identified correctly), partially incorrect peptide pairs (one peptide identified correctly, the other incorrectly), and incorrect peptide pairs (both peptides identified incorrectly). To learn these components, we exploit the score distributions of first- and second-ranked peptide-spectrum matches for each experimental spectrum and subsequently estimate FDR using a novel expectation-maximization algorithm with constraints. We evaluate the method on ten datasets and provide evidence that the proposed DFA is theoretically sound and a viable alternative to TDA owing to its good performance in terms of accuracy, variance of estimation, and run time.
交联串联质谱(XL-MS/MS)是一种成熟的分析平台,用于确定蛋白质内残基之间或物理相互作用的蛋白质之间的距离约束,从而提高我们对蛋白质结构和功能的理解。为了通过 XL-MS/MS 促进生物学发现,准确识别化学交联的肽对至关重要,这一过程需要:(i)数据库搜索,为每个实验谱创建候选肽对的排名列表,以及(ii)错误发现率(FDR)估计,确定在一组得分高于给定阈值的排名靠前的肽对中出现错误匹配的概率。目前,XL-MS/MS 中唯一可用的 FDR 估计机制是靶标-诱饵方法(TDA)。然而,尽管 TDA 简单,但它存在理论和实际的局限性,这会影响估计的准确性,并增加潜在无诱饵方法(DFA)的运行时间。
我们提出了一种用于 XL-MS/MS 中 FDR 估计的新颖的无诱饵框架。我们的方法依赖于偏态正态分布的多样本混合物,其中潜在成分对应于正确肽对(两个肽都正确鉴定)、部分不正确肽对(一个肽正确鉴定,另一个不正确鉴定)和不正确肽对(两个肽都不正确鉴定)的分数。为了学习这些成分,我们利用每个实验谱的第一和第二排名肽谱匹配的分数分布,随后使用具有约束的新期望最大化算法估计 FDR。我们在十个数据集上评估了该方法,并提供了证据表明,由于其在准确性、估计方差和运行时间方面的良好表现,所提出的 DFA 在理论上是合理的,并且是 TDA 的可行替代方案。