Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, 14482, Germany.
Department of Computer Science and Engineering, Indian Institute of Technology, Ropar, Rupnagar, 140001, India.
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii70-ii78. doi: 10.1093/bioinformatics/btae389.
Accurate quantitative information about protein abundance is crucial for understanding a biological system and its dynamics. Protein abundance is commonly estimated using label-free, bottom-up mass spectrometry (MS) protocols. Here, proteins are digested into peptides before quantification via MS. However, missing peptide abundance values, which can make up more than 50% of all abundance values, are a common issue. They result in missing protein abundance values, which then hinder accurate and reliable downstream analyses.
To impute missing abundance values, we propose PEPerMINT, a graph neural network model working directly on the peptide level that flexibly takes both peptide-to-protein relationships in a graph format as well as amino acid sequence information into account. We benchmark our method against 11 common imputation methods on 6 diverse datasets, including cell lines, tissue, and plasma samples. We observe that PEPerMINT consistently outperforms other imputation methods. Its prediction performance remains high for varying degrees of missingness, different evaluation approaches, and differential expression prediction. As an additional novel feature, PEPerMINT provides meaningful uncertainty estimates and allows for tailoring imputation to the user's needs based on the reliability of imputed values.
The code is available at https://github.com/DILiS-lab/pepermint.
准确的蛋白质丰度定量信息对于理解生物系统及其动态至关重要。蛋白质丰度通常使用无标记、自下而上的质谱(MS)方法进行估计。在此,蛋白质在通过 MS 定量之前被消化成肽。然而,缺失的肽丰度值(超过所有丰度值的 50%)是一个常见的问题。这些缺失值导致了缺失的蛋白质丰度值,从而阻碍了下游准确和可靠的分析。
为了估算缺失的丰度值,我们提出了 PEPerMINT,这是一种直接在肽水平上工作的图神经网络模型,灵活地考虑了图格式中的肽-蛋白关系以及氨基酸序列信息。我们在 6 个不同的数据集上,包括细胞系、组织和血浆样本,将我们的方法与 11 种常用的插补方法进行了基准测试。我们观察到,PEPerMINT 始终优于其他插补方法。对于不同程度的缺失、不同的评估方法和差异表达预测,其预测性能仍然很高。作为一个额外的新功能,PEPerMINT 提供了有意义的不确定性估计,并允许根据插补值的可靠性,根据用户的需求定制插补。
代码可在 https://github.com/DILiS-lab/pepermint 上获得。