Suppr超能文献

基于单细胞 RNA-seq 数据的从局部到全局的基因共表达估计。

From local to global gene co-expression estimation using single-cell RNA-seq data.

机构信息

Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States.

出版信息

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae001.

Abstract

In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.

摘要

在基因组学研究中,对基因关系的研究常常带来重要的生物学见解。目前,大型异质数据集给统计学家带来了新的挑战,因为基因关系通常是局部的。它们从一个样本点到另一个样本点变化,可能只存在于样本的一个子集,并且可能是非线性的甚至是非单调的。大多数先前的相关性度量方法并没有专门针对局部相关性关系,而那些针对局部相关性关系的方法计算成本很高。在本文中,我们探讨了一种先进的网络估计技术,该技术名为细胞特异性基因网络,用于描述单细胞水平的基因关系。我们首先证明,在人群中对细胞特异性基因关系进行平均化,可以得到一种新的单变量相关性度量方法,即平均局部密度差距(aLDG),它可以累积局部相关性并检测任何非线性、非单调的关系。同时,我们还建立了一个一致的非参数估计器,证明了它在人群和经验水平上的稳健性。然后,我们表明,通过一些外部结构信息(例如空间或时间因素)来对细胞特异性基因关系进行平均化,可以更好地突出有意义的局部结构变化点。我们探索了 aLDG 及其 minibatch 变体在许多场景中的应用,包括成对基因关系估计、细胞轨迹中的分叉点检测和空间转录组学结构可视化。模拟和真实数据分析都表明,aLDG 优于现有方法。

相似文献

1
From local to global gene co-expression estimation using single-cell RNA-seq data.
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae001.
2
scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data.
Bioinformatics. 2022 Mar 4;38(6):1575-1583. doi: 10.1093/bioinformatics/btac011.
3
Gene Regulatory Network Inference Using Convolutional Neural Networks from scRNA-seq Data.
J Comput Biol. 2023 May;30(5):619-631. doi: 10.1089/cmb.2022.0355. Epub 2023 Mar 6.
5
scMAGS: Marker gene selection from scRNA-seq data for spatial transcriptomics studies.
Comput Biol Med. 2023 Mar;155:106634. doi: 10.1016/j.compbiomed.2023.106634. Epub 2023 Feb 9.
6
scGGAN: single-cell RNA-seq imputation by graph-based generative adversarial network.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad040.
7
Inference of single-cell network using mutual information for scRNA-seq data analysis.
BMC Bioinformatics. 2024 Sep 5;25(Suppl 2):292. doi: 10.1186/s12859-024-05895-3.
10
Multi-View Clustering With Graph Learning for scRNA-Seq Data.
IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3535-3546. doi: 10.1109/TCBB.2023.3298334. Epub 2023 Dec 25.

本文引用的文献

1
Modeling intercellular communication in tissues using spatial graphs of cells.
Nat Biotechnol. 2023 Mar;41(3):332-336. doi: 10.1038/s41587-022-01467-z. Epub 2022 Oct 27.
2
Constructing local cell-specific networks from single-cell data.
Proc Natl Acad Sci U S A. 2021 Dec 21;118(51). doi: 10.1073/pnas.2113178118.
3
Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH.
Nature. 2021 Oct;598(7879):137-143. doi: 10.1038/s41586-021-03705-x. Epub 2021 Oct 6.
4
ESCO: single cell expression simulation incorporating gene co-expression.
Bioinformatics. 2021 Aug 25;37(16):2374-2381. doi: 10.1093/bioinformatics/btab116.
5
Method of the Year: spatially resolved transcriptomics.
Nat Methods. 2021 Jan;18(1):9-14. doi: 10.1038/s41592-020-01033-y.
6
Investigating higher-order interactions in single-cell data with scHOT.
Nat Methods. 2020 Aug;17(8):799-806. doi: 10.1038/s41592-020-0885-x. Epub 2020 Jul 13.
7
scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.
Proc Natl Acad Sci U S A. 2019 May 14;116(20):9775-9784. doi: 10.1073/pnas.1820006116. Epub 2019 Apr 26.
8
Cell-specific network constructed by single-cell RNA sequencing data.
Nucleic Acids Res. 2019 Jun 20;47(11):e62. doi: 10.1093/nar/gkz172.
9
Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.
BMC Genomics. 2018 Jun 19;19(1):477. doi: 10.1186/s12864-018-4772-0.
10
TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES.
Ann Appl Stat. 2017 Sep;11(3):1810-1831. doi: 10.1214/17-AOAS1062. Epub 2017 Oct 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验