Department of Statistics and Data Science, Carnegie Mellon University, 15213, Pittsburgh, PA, United States.
Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae001.
In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.
在基因组学研究中,对基因关系的研究常常带来重要的生物学见解。目前,大型异质数据集给统计学家带来了新的挑战,因为基因关系通常是局部的。它们从一个样本点到另一个样本点变化,可能只存在于样本的一个子集,并且可能是非线性的甚至是非单调的。大多数先前的相关性度量方法并没有专门针对局部相关性关系,而那些针对局部相关性关系的方法计算成本很高。在本文中,我们探讨了一种先进的网络估计技术,该技术名为细胞特异性基因网络,用于描述单细胞水平的基因关系。我们首先证明,在人群中对细胞特异性基因关系进行平均化,可以得到一种新的单变量相关性度量方法,即平均局部密度差距(aLDG),它可以累积局部相关性并检测任何非线性、非单调的关系。同时,我们还建立了一个一致的非参数估计器,证明了它在人群和经验水平上的稳健性。然后,我们表明,通过一些外部结构信息(例如空间或时间因素)来对细胞特异性基因关系进行平均化,可以更好地突出有意义的局部结构变化点。我们探索了 aLDG 及其 minibatch 变体在许多场景中的应用,包括成对基因关系估计、细胞轨迹中的分叉点检测和空间转录组学结构可视化。模拟和真实数据分析都表明,aLDG 优于现有方法。