• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ProJect:一种强大的混合模型缺失值插补方法。

ProJect: a powerful mixed-model missing value imputation method.

机构信息

School of Biological Sciences, Nanyang Technological University, Singapore.

Department of Computer Science, National University of Singapore, Singapore.

出版信息

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad233.

DOI:10.1093/bib/bbad233
PMID:37419612
Abstract

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.

摘要

缺失值(MVs)会对数据分析和机器学习模型的开发产生不利影响。我们提出了一种新的混合模型缺失值插补(MVI)方法。这个方法被称为 ProJect(蛋白质注射的缩写),与贝叶斯主成分分析(PCA)、概率 PCA、局部最小二乘法和左截断数据的分位数回归插补等现有 MVI 方法相比,是一个强大且有意义的改进。我们在各种高通量数据类型上严格测试了 ProJect,包括基因组学和基于质谱(MS)的蛋白质组学。具体来说,我们利用 DIA-SWATH 获得的肾细胞癌(RC)数据、DIA-MS 获得的卵巢癌(OC)数据、膀胱癌(BladderBatch)和胶质母细胞瘤(GBM)微阵列基因表达数据集。我们的结果表明,ProJect 始终优于其他参考 MVI 方法。它的归一化均方根误差(在 RC_C 中平均得分低 45.92%,在 RC_full 中低 27.37%,在 OC 中低 29.22%,在 BladderBatch 中低 23.65%,在 GBM 中低 20.20%,比最接近的竞争方法)和普罗克鲁斯和平方误差(Procrustes SS)(在 RC_C 中低 79.71%,在 RC_full 中低 38.36%,在 OC 中低 18.13%,在 BladderBatch 中低 74.74%,在 GBM 中低 30.79%,比下一个最佳方法)。ProJect 还在各种类型的 MV 组合中具有最高的相关系数(在 RC_C 中高 0.64%,在 RC_full 中高 0.24%,在 OC 中高 0.55%,在 BladderBatch 中高 0.39%,在 GBM 中高 0.27%,比表现第二好的方法)。ProJect 的主要优势在于它能够处理实际数据中常见的不同类型的 MV。与大多数旨在处理一种 MV 类型的 MVI 方法不同,ProJect 采用了一种决策算法,该算法首先确定 MV 是随机缺失还是非随机缺失。然后,它针对每种 MV 类型采用有针对性的插补策略,从而产生更准确和可靠的插补结果。ProJect 的 R 实现可在 https://github.com/miaomiao6606/ProJect 上获得。

相似文献

1
ProJect: a powerful mixed-model missing value imputation method.ProJect:一种强大的混合模型缺失值插补方法。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad233.
2
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。
Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.
3
A hybrid imputation approach for microarray missing value estimation.一种用于微阵列缺失值估计的混合插补方法。
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.
4
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
5
Iterative bicluster-based Bayesian principal component analysis and least squares for missing-value imputation in microarray and RNA-sequencing data.基于迭代双聚类的贝叶斯主成分分析和最小二乘法在微阵列和 RNA 测序数据中的缺失值插补。
Math Biosci Eng. 2022 Jun 16;19(9):8741-8759. doi: 10.3934/mbe.2022405.
6
Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study.基于随机森林的插补方法在 LC-MS 代谢组学数据插补方面优于其他方法:一项比较研究。
BMC Bioinformatics. 2019 Oct 11;20(1):492. doi: 10.1186/s12859-019-3110-0.
7
A comparative study of evaluating missing value imputation methods in label-free proteomics.基于无标记蛋白质组学的缺失值插补方法评估的比较研究。
Sci Rep. 2021 Jan 19;11(1):1760. doi: 10.1038/s41598-021-81279-4.
8
Robust imputation method for missing values in microarray data.微阵列数据中缺失值的稳健插补方法。
BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-8-S2-S6.
9
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
10
Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics.自下而上蛋白质组学中缺失值问题的多重插补方法。
Int J Mol Sci. 2021 Sep 6;22(17):9650. doi: 10.3390/ijms22179650.

引用本文的文献

1
Optimizing imputation strategies for mass spectrometry-based proteomics considering intensity and missing value rates.考虑强度和缺失值率优化基于质谱的蛋白质组学的插补策略。
Comput Struct Biotechnol J. 2025 May 3;27:1818-1826. doi: 10.1016/j.csbj.2025.04.041. eCollection 2025.
2
PEPerMINT: peptide abundance imputation in mass spectrometry-based proteomics using graph neural networks.PEPerMINT:基于图神经网络的质谱蛋白质组学中肽丰度推断。
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii70-ii78. doi: 10.1093/bioinformatics/btae389.
3
Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning.
基于自监督深度学习的无标签定量蛋白质组学数据的推断。
Nat Commun. 2024 Jun 26;15(1):5405. doi: 10.1038/s41467-024-48711-5.
4
Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference.通过高性能规则和集成推理优化蛋白质组学数据的差异表达分析。
Nat Commun. 2024 May 9;15(1):3922. doi: 10.1038/s41467-024-47899-w.