Suppr超能文献

基于等压标记的 shotgun 蛋白质组学中填补策略的综述

A Review of Imputation Strategies for Isobaric Labeling-Based Shotgun Proteomics.

机构信息

Computing & Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.

Boeing, Seattle, Washington 98055, United States.

出版信息

J Proteome Res. 2021 Jan 1;20(1):1-13. doi: 10.1021/acs.jproteome.0c00123. Epub 2020 Sep 25.

Abstract

The throughput efficiency and increased depth of coverage provided by isobaric-labeled proteomics measurements have led to increased usage of these techniques. However, the structure of missing data is different than unlabeled studies, which prompts the need for this review to compare the efficacy of nine imputation methods on large isobaric-labeled proteomics data sets to guide researchers on the appropriateness of various imputation methods. Imputation methods were evaluated by accuracy, statistical hypothesis test inference, and run time. In general, expectation maximization and random forest imputation methods yielded the best performance, and constant-based methods consistently performed poorly across all data set sizes and percentages of missing values. For data sets with small sample sizes and higher percentages of missing data, results indicate that statistical inference with no imputation may be preferable. On the basis of the findings in this review, there are core imputation methods that perform better for isobaric-labeled proteomics data, but great care and consideration as to whether imputation is the optimal strategy should be given for data sets comprised of a small number of samples.

摘要

同重标记蛋白质组学测量的高通量效率和增加的覆盖深度导致了这些技术的使用增加。然而,缺失数据的结构与未标记的研究不同,这促使我们需要对这九种插补方法在大型同重标记蛋白质组学数据集上的功效进行比较,以指导研究人员选择各种插补方法的适当性。通过准确性、统计假设检验推断和运行时间来评估插补方法。一般来说,期望最大化和随机森林插补方法的性能最好,而基于常数的方法在所有数据集大小和缺失值百分比下的性能都很差。对于样本量较小且缺失数据百分比较高的数据集,结果表明没有插补的统计推断可能是更好的选择。基于本综述的结果,对于同重标记蛋白质组学数据,有一些核心的插补方法表现更好,但对于由少数样本组成的数据集,应慎重考虑是否采用插补策略。

相似文献

8
DIMA: Data-Driven Selection of an Imputation Algorithm.DIMA:基于数据驱动的插补算法选择。
J Proteome Res. 2021 Jul 2;20(7):3489-3496. doi: 10.1021/acs.jproteome.1c00119. Epub 2021 Jun 1.

引用本文的文献

2
Informatics at the Frontier of Cancer Research.癌症研究前沿的信息学
Cancer Res. 2025 Aug 15;85(16):2967-2986. doi: 10.1158/0008-5472.CAN-24-2829.
8
Cross-platform proteomics signatures of extreme old age.高龄的跨平台蛋白质组学特征
Geroscience. 2025 Feb;47(1):1199-1220. doi: 10.1007/s11357-024-01286-x. Epub 2024 Jul 25.

本文引用的文献

2
Multibatch TMT Reveals False Positives, Batch Effects and Missing Values.多批次 TMT 揭示了假阳性、批次效应和缺失值。
Mol Cell Proteomics. 2019 Oct;18(10):1967-1980. doi: 10.1074/mcp.RA119.001472. Epub 2019 Jul 22.
8
2016 update of the PRIDE database and its related tools.PRIDE数据库及其相关工具的2016年更新。
Nucleic Acids Res. 2016 Dec 15;44(22):11033. doi: 10.1093/nar/gkw880. Epub 2016 Sep 28.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验