用于在PARAFAC张量分解中插补缺失值的截尾最小二乘法

Censored Least Squares for Imputing Missing Values in PARAFAC Tensor Factorization.

作者信息

Hung Ethan C, Hodzic Enio, Tan Zhixin Cyrillus, Meyer Aaron S

机构信息

Computational and Systems Biology, University of California, Los Angeles (UCLA), USA.

Department of Bioengineering, UCLA, USA.

出版信息

bioRxiv. 2024 Jul 10:2024.07.05.602272. doi: 10.1101/2024.07.05.602272.

DOI:10.1101/2024.07.05.602272

PMID:39026852

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11257416/

Abstract

Tensor factorization is a dimensionality reduction method applied to multidimensional arrays. These methods are useful for identifying patterns within a variety of biomedical datasets due to their ability to preserve the organizational structure of experiments and therefore aid in generating meaningful insights. However, missing data in the datasets being analyzed can impose challenges. Tensor factorization can be performed with some level of missing data and reconstruct a complete tensor. However, while tensor methods may impute these missing values, the choice of fitting algorithm may influence the fidelity of these imputations. Previous approaches, based on alternating least squares with prefilled values or direct optimization, suffer from introduced bias or slow computational performance. In this study, we propose that censored least squares can better handle missing values with data structured in tensor form. We ran censored least squares on four different biological datasets and compared its performance against alternating least squares with prefilled values and direct optimization. We used the error of imputation and the ability to infer masked values to benchmark their missing data performance. Censored least squares appeared best suited for the analysis of high-dimensional biological data by accuracy and convergence metrics across several studies.

摘要

张量分解是一种应用于多维数组的降维方法。这些方法对于识别各种生物医学数据集中的模式很有用，因为它们能够保留实验的组织结构，从而有助于产生有意义的见解。然而，被分析数据集中的缺失数据可能会带来挑战。张量分解可以在存在一定程度缺失数据的情况下进行，并重建一个完整的张量。然而，虽然张量方法可以估算这些缺失值，但拟合算法的选择可能会影响这些估算的保真度。以前基于带预填充值的交替最小二乘法或直接优化的方法存在引入偏差或计算性能缓慢的问题。在本研究中，我们提出截尾最小二乘法能够更好地处理张量形式的数据中的缺失值。我们在四个不同的生物数据集上运行了截尾最小二乘法，并将其性能与带预填充值的交替最小二乘法和直接优化进行了比较。我们使用估算误差和推断掩码值的能力来衡量它们处理缺失数据的性能。通过多项研究中的准确性和收敛性指标来看，截尾最小二乘法似乎最适合用于分析高维生物数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e0b/11257416/5883df766d26/nihpp-2024.07.05.602272v1-f0001.jpg

相似文献

Censored Least Squares for Imputing Missing Values in PARAFAC Tensor Factorization.

bioRxiv. 2024 Jul 10:2024.07.05.602272. doi: 10.1101/2024.07.05.602272.

Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models.

Front Big Data. 2021 Nov 1;4:684794. doi: 10.3389/fdata.2021.684794. eCollection 2021.

NS-kNN: a modified k-nearest neighbors approach for imputing metabolomics data.

Metabolomics. 2018 Nov 23;14(12):153. doi: 10.1007/s11306-018-1451-8.

Practical approaches to principal component analysis for simultaneously dealing with missing and censored elements in chemical data.

Anal Chim Acta. 2013 Sep 24;796:27-37. doi: 10.1016/j.aca.2013.08.026. Epub 2013 Aug 20.

A hybrid imputation approach for microarray missing value estimation.

BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.

Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.

Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.

BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1.

Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.

BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.

PLoS One. 2016 Aug 18;11(8):e0160733. doi: 10.1371/journal.pone.0160733. eCollection 2016.

The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study.

J Clin Epidemiol. 2024 Dec;176:111539. doi: 10.1016/j.jclinepi.2024.111539. Epub 2024 Sep 24.

本文引用的文献

Systems profiling reveals recurrently dysregulated cytokine signaling responses in ER+ breast cancer patients' blood.

NPJ Syst Biol Appl. 2024 Oct 10;10(1):118. doi: 10.1038/s41540-024-00447-0.

The structure is the message: Preserving experimental context through tensor decomposition.

Cell Syst. 2024 Aug 21;15(8):679-693. doi: 10.1016/j.cels.2024.07.004.

Pathway trajectory analysis with tensor imputation reveals drug-induced single-cell transcriptomic landscape.

Nat Comput Sci. 2022 Nov;2(11):758-770. doi: 10.1038/s43588-022-00352-8. Epub 2022 Nov 24.

Tensor-based insights into systems immunity and infectious disease.

Trends Immunol. 2023 May;44(5):329-332. doi: 10.1016/j.it.2023.03.003. Epub 2023 Mar 29.

Multiplexed and reproducible high content screening of live and fixed cells using Dye Drop.

Nat Commun. 2022 Nov 14;13(1):6918. doi: 10.1038/s41467-022-34536-7.

Tensor-structured decomposition improves systems serology analysis.

Mol Syst Biol. 2021 Sep;17(9):e10243. doi: 10.15252/msb.202110243.

Compromised Humoral Functional Evolution Tracks with SARS-CoV-2 Mortality.

Cell. 2020 Dec 10;183(6):1508-1519.e12. doi: 10.1016/j.cell.2020.10.052. Epub 2020 Nov 3.

A Review of Integrative Imputation for Multi-Omics Datasets.

Front Genet. 2020 Oct 15;11:570255. doi: 10.3389/fgene.2020.570255. eCollection 2020.

A review of computational strategies for denoising and imputation of single-cell transcriptomic data.

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa222.

SciPy 1.0: fundamental algorithms for scientific computing in Python.

Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于在PARAFAC张量分解中插补缺失值的截尾最小二乘法

Censored Least Squares for Imputing Missing Values in PARAFAC Tensor Factorization.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献