Suppr超能文献

剖析基因表达异质性:广义皮尔逊相关平方与 - 线聚类算法

Dissecting gene expression heterogeneity: generalized Pearson correlation squares and the -lines clustering algorithm.

作者信息

Li Jingyi Jessica, Zhou Heather J, Bickel Peter J, Tong Xin

机构信息

Department of Statistics, University of California, Los Angeles.

Department of Statistics, University of California, Berkeley.

出版信息

J Am Stat Assoc. 2024;119(548):2450-2463. doi: 10.1080/01621459.2024.2342639. Epub 2024 May 24.

Abstract

Motivated by the pressing needs for dissecting heterogeneous relationships in gene expression data, here we generalize the squared Pearson correlation to capture a mixture of linear dependences between two real-valued variables, with or without an index variable that specifies the line memberships. We construct the generalized Pearson correlation squares by focusing on three aspects: variable exchangeability, no parametric model assumptions, and inference of population-level parameters. To compute the generalized Pearson correlation square from a sample without a line-membership specification, we develop a -lines clustering algorithm to find clusters that exhibit distinct linear dependences, where can be chosen in a data-adaptive way. To infer the population-level generalized Pearson correlation squares, we derive the asymptotic distributions of the sample-level statistics to enable efficient statistical inference. Simulation studies verify the theoretical results and show the power advantage of the generalized Pearson correlation squares in capturing mixtures of linear dependences. Gene expression data analyses demonstrate the effectiveness of the generalized Pearson correlation squares and the -lines clustering algorithm in dissecting complex but interpretable relationships. The estimation and inference procedures are implemented in the R package gR2 (https://github.com/lijy03/gR2).

摘要

出于剖析基因表达数据中异质关系的迫切需求,我们在此将平方皮尔逊相关性进行推广,以捕捉两个实值变量之间线性依赖关系的混合情况,无论是否存在指定线性成员关系的索引变量。我们通过关注三个方面来构建广义皮尔逊相关平方:变量可交换性、无参数模型假设以及总体水平参数的推断。为了从没有线性成员关系指定的样本中计算广义皮尔逊相关平方,我们开发了一种k - 线聚类算法来找到表现出不同线性依赖关系的k个聚类,其中k可以以数据自适应的方式选择。为了推断总体水平的广义皮尔逊相关平方,我们推导样本水平统计量的渐近分布以实现有效的统计推断。模拟研究验证了理论结果,并展示了广义皮尔逊相关平方在捕捉线性依赖关系混合方面的功效优势。基因表达数据分析证明了广义皮尔逊相关平方和k - 线聚类算法在剖析复杂但可解释关系方面的有效性。估计和推断程序在R包gR2(https://github.com/lijy03/gR2)中实现。

相似文献

4
8
Joint regression analysis of multiple traits based on genetic relationships.基于遗传关系的多性状联合回归分析
Bioinform Adv. 2024 Jan 4;4(1):vbad192. doi: 10.1093/bioadv/vbad192. eCollection 2024.
9
Local Rank Inference for Varying Coefficient Models.变系数模型的局部秩推断
J Am Stat Assoc. 2009 Dec 1;104(488):1631-1645. doi: 10.1198/jasa.2009.tm09055.

本文引用的文献

1
4
Adaptive Mixtures of Local Experts.局部专家的自适应混合模型
Neural Comput. 1991 Spring;3(1):79-87. doi: 10.1162/neco.1991.3.1.79.
5
Generalized R-squared for detecting dependence.用于检测依赖性的广义决定系数。
Biometrika. 2017 Mar;104(1):129-139. doi: 10.1093/biomet/asw071. Epub 2017 Feb 22.
9
Detecting novel associations in large data sets.在大型数据集 中检测新的关联。
Science. 2011 Dec 16;334(6062):1518-24. doi: 10.1126/science.1205438.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验