Suppr超能文献

非人类灵长类动物蛋白质组学中无标记定量和缺失值插补的评估。

Assessment of label-free quantification and missing value imputation for proteomics in non-human primates.

机构信息

Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA.

Southwest National Primate Research Center, San Antonio, TX, USA.

出版信息

BMC Genomics. 2022 Jul 8;23(1):496. doi: 10.1186/s12864-022-08723-1.

Abstract

BACKGROUND

Reliable and effective label-free quantification (LFQ) analyses are dependent not only on the method of data acquisition in the mass spectrometer, but also on the downstream data processing, including software tools, query database, data normalization and imputation. In non-human primates (NHP), LFQ is challenging because the query databases for NHP are limited since the genomes of these species are not comprehensively annotated. This invariably results in limited discovery of proteins and associated Post Translational Modifications (PTMs) and a higher fraction of missing data points. While identification of fewer proteins and PTMs due to database limitations can negatively impact uncovering important and meaningful biological information, missing data also limits downstream analyses (e.g., multivariate analyses), decreases statistical power, biases statistical inference, and makes biological interpretation of the data more challenging. In this study we attempted to address both issues: first, we used the MetaMorphues proteomics search engine to counter the limits of NHP query databases and maximize the discovery of proteins and associated PTMs, and second, we evaluated different imputation methods for accurate data inference. We used a generic approach for missing data imputation analysis without distinguising the potential source of missing data (either non-assigned m/z or missing values across runs).

RESULTS

Using the MetaMorpheus proteomics search engine we obtained quantitative data for 1622 proteins and 10,634 peptides including 58 different PTMs (biological, metal and artifacts) across a diverse age range of NHP brain frontal cortex. However, among the 1622 proteins identified, only 293 proteins were quantified across all samples with no missing values, emphasizing the importance of implementing an accurate and statiscaly valid imputation method to fill in missing data. In our imputation analysis we demonstrate that Single Imputation methods that borrow information from correlated proteins such as Generalized Ridge Regression (GRR), Random Forest (RF), local least squares (LLS), and a Bayesian Principal Component Analysis methods (BPCA), are able to estimate missing protein abundance values with great accuracy.

CONCLUSIONS

Overall, this study offers a detailed comparative analysis of LFQ data generated in NHP and proposes strategies for improved LFQ in NHP proteomics data.

摘要

背景

可靠有效的无标记定量(LFQ)分析不仅依赖于质谱仪的数据采集方法,还依赖于下游的数据处理,包括软件工具、查询数据库、数据标准化和插补。在非人类灵长类动物(NHP)中,LFQ 具有挑战性,因为这些物种的基因组没有得到全面注释,因此用于 NHP 的查询数据库是有限的。这不可避免地导致蛋白质和相关翻译后修饰(PTM)的发现有限,以及更多的数据点缺失。由于数据库的限制导致鉴定的蛋白质和 PTM 较少,可能会对揭示重要和有意义的生物学信息产生负面影响,而数据缺失也会限制下游分析(例如,多元分析),降低统计效力,影响统计推断,并使数据的生物学解释更加困难。在这项研究中,我们试图解决这两个问题:首先,我们使用 MetaMorpheus 蛋白质组学搜索引擎来克服 NHP 查询数据库的限制,最大限度地发现蛋白质和相关 PTM;其次,我们评估了不同的插补方法,以进行准确的数据推断。我们使用一种通用的缺失数据插补分析方法,而不区分缺失数据的潜在来源(无论是未分配的 m/z 还是跨运行的缺失值)。

结果

使用 MetaMorpheus 蛋白质组学搜索引擎,我们在广泛的 NHP 大脑额叶年龄范围内获得了 1622 种蛋白质和 10634 种肽段的定量数据,包括 58 种不同的翻译后修饰(生物、金属和伪影)。然而,在鉴定的 1622 种蛋白质中,只有 293 种蛋白质在所有样本中都没有缺失值,这强调了实施准确和统计有效的插补方法来填补缺失数据的重要性。在我们的插补分析中,我们证明了从相关蛋白质(如广义岭回归(GRR)、随机森林(RF)、局部最小二乘(LLS)和贝叶斯主成分分析方法(BPCA))中借用信息的单插补方法能够非常准确地估计缺失蛋白质丰度值。

结论

总的来说,这项研究对 NHP 中产生的 LFQ 数据进行了详细的比较分析,并提出了改进 NHP 蛋白质组学数据 LFQ 的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/85e6/9264528/c8123402e5fe/12864_2022_8723_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验