• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

rMisbeta:转录组学和代谢组学数据中稳健的缺失值插补方法。

rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data.

机构信息

Department of Statistics, Begum Rokeya University, Rangpur, 5400, Bangladesh.

Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia.

出版信息

Comput Biol Med. 2021 Nov;138:104911. doi: 10.1016/j.compbiomed.2021.104911. Epub 2021 Sep 29.

DOI:10.1016/j.compbiomed.2021.104911
PMID:34634637
Abstract

Transcriptomics and metabolomics data often contain missing values or outliers due to limitations of the data acquisition techniques. Most of the statistical methods require complete datasets for downstream analysis. A number of methods have been developed for missing value imputation using the classical mean and variance based on maximum likelihood estimators, which are not robust against outliers. Consequently, the performance of these methods deteriorates in the presence of outliers. Hence precise imputation of missing values and outliers handling are both concurrently important. Therefore, in this paper, we developed a robust iterative approach using robust estimators based on the minimum beta divergence method, which simultaneously impute missing values and outliers. We investigate the performance of the proposed method in a comparison with six frequently used missing value imputation methods such as Zero, KNN, robust SVD, EM, random forest (RF) and weighted least square approach (WLSA) through feature selection using both simulated and real datasets. Ten performance indices were used to explore the optimal method such as Frobenius norm (FOBN), accuracy (ACC), sensitivity (SN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), detection rate (DR), misclassification error rate (MER), the area under the ROC curve (AUC) and computational runtime. Evaluation based on both simulated and real data suggests the superiority of the proposed method over the other traditional methods in terms of various rates of outliers and missing values. The suggested approach also keeps almost equal performance in absence of outliers with the other methods. The proposed method is accurate, simple, and consumes lower computational time compared to the other methods. Therefore, our recommendation is to apply the proposed procedure for large-scale transcriptomics and metabolomics data analysis. The computational tool has been implemented in an R package, which is publicly available from https://CRAN.R-project.org/package=rMisbeta.

摘要

转录组学和代谢组学数据通常由于数据采集技术的限制而包含缺失值或异常值。大多数统计方法都需要完整的数据集进行下游分析。已经开发了许多基于最大似然估计的经典均值和方差的缺失值插补方法,但它们对异常值不稳健。因此,在存在异常值的情况下,这些方法的性能会恶化。因此,缺失值的精确插补和异常值处理都同等重要。因此,在本文中,我们开发了一种基于最小β散度方法的稳健迭代方法,该方法可以同时插补缺失值和异常值。我们通过使用模拟数据集和真实数据集进行特征选择,将提出的方法与零、KNN、稳健 SVD、EM、随机森林 (RF) 和加权最小二乘法 (WLSA) 等六种常用的缺失值插补方法进行比较,评估了该方法的性能。使用 Frobenius 范数 (FOBN)、准确性 (ACC)、灵敏度 (SN)、特异性 (SP)、阳性预测值 (PPV)、阴性预测值 (NPV)、检测率 (DR)、误分类错误率 (MER)、ROC 曲线下面积 (AUC) 和计算运行时间等十个性能指标来探索最佳方法。基于模拟和真实数据的评估表明,与其他传统方法相比,该方法在各种异常值和缺失值比率下具有优越性。在不存在异常值的情况下,该方法的性能与其他方法几乎相同。与其他方法相比,该方法具有准确性高、简单、计算时间消耗低等优点。因此,我们建议将提出的方法应用于大规模转录组学和代谢组学数据分析。该计算工具已在 R 包中实现,可从 https://CRAN.R-project.org/package=rMisbeta 获得。

相似文献

1
rMisbeta: A robust missing value imputation approach in transcriptomics and metabolomics data.rMisbeta:转录组学和代谢组学数据中稳健的缺失值插补方法。
Comput Biol Med. 2021 Nov;138:104911. doi: 10.1016/j.compbiomed.2021.104911. Epub 2021 Sep 29.
2
Kernel weighted least square approach for imputing missing values of metabolomics data.核加权最小二乘法在代谢组学数据缺失值插补中的应用。
Sci Rep. 2021 May 27;11(1):11108. doi: 10.1038/s41598-021-90654-0.
3
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.基于质谱的代谢组学数据的缺失值插补方法。
Sci Rep. 2018 Jan 12;8(1):663. doi: 10.1038/s41598-017-19120-0.
4
NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data.基于 NMF 的质谱代谢组学数据缺失值插补方法。
Molecules. 2021 Sep 24;26(19):5787. doi: 10.3390/molecules26195787.
5
NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data.NS-kNN:一种改进的 k-最近邻方法,用于代谢组学数据插补。
Metabolomics. 2018 Nov 23;14(12):153. doi: 10.1007/s11306-018-1451-8.
6
Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.存在异常值和缺失值时的代谢组学生物标志物识别
Biomed Res Int. 2017;2017:2437608. doi: 10.1155/2017/2437608. Epub 2017 Feb 14.
7
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
8
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies.GSimp:一种基于 Gibbs 抽样的代谢组学研究中左截断缺失值插补方法。
PLoS Comput Biol. 2018 Jan 31;14(1):e1005973. doi: 10.1371/journal.pcbi.1005973. eCollection 2018 Jan.
9
Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies.基于分布的最近邻插补法用于截断高维数据及其在临床前和临床代谢组学研究中的应用
BMC Bioinformatics. 2017 Feb 20;18(1):114. doi: 10.1186/s12859-017-1547-6.
10
GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis.GMSimpute:一种用于在无标记质谱分析中插补缺失值的广义两步套索方法。
Bioinformatics. 2020 Jan 1;36(1):257-263. doi: 10.1093/bioinformatics/btz488.

引用本文的文献

1
Exposure-inducible genes may contribute to missingness in RNAseq-based gene expression analyses.暴露诱导基因可能导致基于RNA测序的基因表达分析中出现数据缺失。
Sci Rep. 2025 Aug 22;15(1):30889. doi: 10.1038/s41598-025-14395-0.
2
Untargeted pixel-by-pixel metabolite ratio imaging as a novel tool for biomedical discovery in mass spectrometry imaging.非靶向逐像素代谢物比率成像作为质谱成像中生物医学发现的一种新工具。
Elife. 2025 Mar 18;13:RP96892. doi: 10.7554/eLife.96892.
3
Targeted plasma metabolomics reveals potential biomarkers of the elderly with mild cognitive impairment in Qingdao rural area.
靶向血浆代谢组学揭示青岛农村地区轻度认知障碍老年人的潜在生物标志物。
Front Aging Neurosci. 2024 Dec 18;16:1511437. doi: 10.3389/fnagi.2024.1511437. eCollection 2024.
4
A practical introduction to holo-omics.全息组学实用入门
Cell Rep Methods. 2024 Jul 15;4(7):100820. doi: 10.1016/j.crmeth.2024.100820. Epub 2024 Jul 9.
5
One-Year Effects of High-Intensity Statin on Bioactive Lipids: Findings From the JUPITER Trial.高强度他汀类药物对生物活性脂质的一年影响:JUPITER 试验的结果。
Arterioscler Thromb Vasc Biol. 2024 Jul;44(7):e196-e206. doi: 10.1161/ATVBAHA.124.321058. Epub 2024 Jun 6.
6
Machine Learning Methods for Survival Analysis with Clinical and Transcriptomics Data of Breast Cancer.机器学习方法在乳腺癌临床和转录组学数据中的生存分析。
Methods Mol Biol. 2023;2553:325-393. doi: 10.1007/978-1-0716-2617-7_16.
7
Data Processing and Analysis in Mass Spectrometry-Based Metabolomics.基于质谱的代谢组学中的数据处理与分析。
Methods Mol Biol. 2023;2571:207-239. doi: 10.1007/978-1-0716-2699-3_20.