文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Biwhitening Reveals the Rank of a Count Matrix.

作者信息

Landa Boris, Zhang Thomas T C K, Kluger Yuval

机构信息

Program in Applied Mathematics, Yale University.

Department of Electrical and Systems Engineering, University of Pennsylvania.

出版信息

SIAM J Math Data Sci. 2022;4(4):1420-1446. doi: 10.1137/21m1456807.


DOI:10.1137/21m1456807
PMID:37576699
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10417917/
Abstract

Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g., Poisson, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, we focus on a Poisson random matrix with independent entries and propose a simple procedure, termed , for estimating the rank of the underlying signal matrix (i.e., the Poisson parameter matrix) without any prior knowledge. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson, our approach is extended to families of distributions that satisfy a quadratic relation between the mean and the variance, such as the generalized Poisson, binomial, negative binomial, gamma, and many others. This quadratic relation can also account for missing entries in the data. We conduct numerical experiments that corroborate our theoretical findings, and showcase the advantage of our approach for rank estimation in challenging regimes. Furthermore, we demonstrate the favorable performance of our approach on several real datasets of single-cell RNA sequencing (scRNA-seq), High-Throughput Chromosome Conformation Capture (Hi-C), and document topic modeling.

摘要

相似文献

[1]
Biwhitening Reveals the Rank of a Count Matrix.

SIAM J Math Data Sci. 2022

[2]
Singular vectors of sums of rectangular random matrices and optimal estimation of high-rank signals: The extensive spike model.

Phys Rev E. 2023-11

[3]
ScLRTC: imputation for single-cell RNA-seq data via low-rank tensor completion.

BMC Genomics. 2021-11-29

[4]
DeepTensor: Low-Rank Tensor Decomposition With Deep Network Priors.

IEEE Trans Pattern Anal Mach Intell. 2024-12

[5]
Improved Task-based Functional MRI Language Mapping in Patients with Brain Tumors through Marchenko-Pastur Principal Component Analysis Denoising.

Radiology. 2021-2

[6]
Low Rank Tensor Completion With Poisson Observations.

IEEE Trans Pattern Anal Mach Intell. 2022-8

[7]
Sparse and Low-Rank Decomposition of a Hankel Structured Matrix for Impulse Noise Removal.

IEEE Trans Image Process. 2017-11-9

[8]
A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments.

BMC Bioinformatics. 2013-8-21

[9]
The augmented lagrange multipliers method for matrix completion from corrupted samplings with application to mixed Gaussian-impulse noise removal.

PLoS One. 2014-9-23

[10]
The Poisson distribution model fits UMI-based single-cell RNA-sequencing data.

Res Sq. 2023-2-6

引用本文的文献

[1]
scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis.

bioRxiv. 2025-8-23

[2]
Principled PCA separates signal from noise in omics count data.

bioRxiv. 2025-2-7

[3]
The Dyson equalizer: adaptive noise stabilization for low-rank signal detection and recovery.

Inf inference. 2025-1-16

[4]
Principled and interpretable alignability testing and integration of single-cell data.

Proc Natl Acad Sci U S A. 2024-3-5

[5]
Causal identification of single-cell experimental perturbation effects with CINEMA-OT.

Nat Methods. 2023-11

[6]
Dimensionality and Ramping: Signatures of Sentence Integration in the Dynamics of Brains and Deep Language Models.

J Neurosci. 2023-7-19

本文引用的文献

[1]
Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis.

Nat Genet. 2021-6

[2]
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression.

Genome Biol. 2019-12-23

[3]
Asymptotic performance of PCA for high-dimensional heteroscedastic data.

J Multivar Anal. 2018-9

[4]
PCA in High Dimensions: An orientation.

Proc IEEE Inst Electr Electron Eng. 2018-8

[5]
Genome-wide analysis reveals no evidence of trans chromosomal regulation of mammalian immune development.

PLoS Genet. 2018-6-8

[6]
Roy's largest root test under rank-one alternatives.

Biometrika. 2017-3

[7]
Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex.

Nat Neurosci. 2017-12-11

[8]
Massively parallel digital transcriptional profiling of single cells.

Nat Commun. 2017-1-16

[9]
An empirical Kaiser criterion.

Psychol Methods. 2016-3-31

[10]
The Scree Test For The Number Of Factors.

Multivariate Behav Res. 1966-4-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索