• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

剖析基因表达异质性:广义皮尔逊相关平方与 - 线聚类算法

Dissecting gene expression heterogeneity: generalized Pearson correlation squares and the -lines clustering algorithm.

作者信息

Li Jingyi Jessica, Zhou Heather J, Bickel Peter J, Tong Xin

机构信息

Department of Statistics, University of California, Los Angeles.

Department of Statistics, University of California, Berkeley.

出版信息

J Am Stat Assoc. 2024;119(548):2450-2463. doi: 10.1080/01621459.2024.2342639. Epub 2024 May 24.

DOI:10.1080/01621459.2024.2342639
PMID:39697782
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11651632/
Abstract

Motivated by the pressing needs for dissecting heterogeneous relationships in gene expression data, here we generalize the squared Pearson correlation to capture a mixture of linear dependences between two real-valued variables, with or without an index variable that specifies the line memberships. We construct the generalized Pearson correlation squares by focusing on three aspects: variable exchangeability, no parametric model assumptions, and inference of population-level parameters. To compute the generalized Pearson correlation square from a sample without a line-membership specification, we develop a -lines clustering algorithm to find clusters that exhibit distinct linear dependences, where can be chosen in a data-adaptive way. To infer the population-level generalized Pearson correlation squares, we derive the asymptotic distributions of the sample-level statistics to enable efficient statistical inference. Simulation studies verify the theoretical results and show the power advantage of the generalized Pearson correlation squares in capturing mixtures of linear dependences. Gene expression data analyses demonstrate the effectiveness of the generalized Pearson correlation squares and the -lines clustering algorithm in dissecting complex but interpretable relationships. The estimation and inference procedures are implemented in the R package gR2 (https://github.com/lijy03/gR2).

摘要

出于剖析基因表达数据中异质关系的迫切需求,我们在此将平方皮尔逊相关性进行推广,以捕捉两个实值变量之间线性依赖关系的混合情况,无论是否存在指定线性成员关系的索引变量。我们通过关注三个方面来构建广义皮尔逊相关平方:变量可交换性、无参数模型假设以及总体水平参数的推断。为了从没有线性成员关系指定的样本中计算广义皮尔逊相关平方,我们开发了一种k - 线聚类算法来找到表现出不同线性依赖关系的k个聚类,其中k可以以数据自适应的方式选择。为了推断总体水平的广义皮尔逊相关平方,我们推导样本水平统计量的渐近分布以实现有效的统计推断。模拟研究验证了理论结果,并展示了广义皮尔逊相关平方在捕捉线性依赖关系混合方面的功效优势。基因表达数据分析证明了广义皮尔逊相关平方和k - 线聚类算法在剖析复杂但可解释关系方面的有效性。估计和推断程序在R包gR2(https://github.com/lijy03/gR2)中实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/a094d2415496/nihms-1993067-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/03bc25333221/nihms-1993067-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/8d2fb9644b81/nihms-1993067-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/183928d5413a/nihms-1993067-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/97dd834f3c0e/nihms-1993067-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/a094d2415496/nihms-1993067-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/03bc25333221/nihms-1993067-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/8d2fb9644b81/nihms-1993067-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/183928d5413a/nihms-1993067-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/97dd834f3c0e/nihms-1993067-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/706b/11651632/a094d2415496/nihms-1993067-f0005.jpg

相似文献

1
Dissecting gene expression heterogeneity: generalized Pearson correlation squares and the -lines clustering algorithm.剖析基因表达异质性:广义皮尔逊相关平方与 - 线聚类算法
J Am Stat Assoc. 2024;119(548):2450-2463. doi: 10.1080/01621459.2024.2342639. Epub 2024 May 24.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
CVtreeMLE: Efficient Estimation of Mixed Exposures using Data Adaptive Decision Trees and Cross-Validated Targeted Maximum Likelihood Estimation in R.CVtreeMLE:在R语言中使用数据自适应决策树和交叉验证的靶向最大似然估计对混合暴露进行有效估计。
J Open Source Softw. 2023;8(82). doi: 10.21105/joss.04181. Epub 2023 Feb 21.
4
5
Efficient statistical inference procedures for partially nonlinear models and their applications.部分非线性模型的高效统计推断程序及其应用。
Biometrics. 2008 Sep;64(3):904-911. doi: 10.1111/j.1541-0420.2007.00937.x. Epub 2007 Nov 19.
6
Spatial prediction of human brucellosis (HB) using a GIS-based adaptive neuro-fuzzy inference system (ANFIS).基于 GIS 的自适应神经模糊推理系统(ANFIS)预测人类布鲁氏菌病(HB)的空间分布。
Acta Trop. 2021 Aug;220:105951. doi: 10.1016/j.actatropica.2021.105951. Epub 2021 May 9.
7
A new GEE method to account for heteroscedasticity using asymmetric least-square regressions.一种使用非对称最小二乘回归来考虑异方差性的新广义估计方程(GEE)方法。
J Appl Stat. 2021 Jul 26;49(14):3564-3590. doi: 10.1080/02664763.2021.1957789. eCollection 2022.
8
Joint regression analysis of multiple traits based on genetic relationships.基于遗传关系的多性状联合回归分析
Bioinform Adv. 2024 Jan 4;4(1):vbad192. doi: 10.1093/bioadv/vbad192. eCollection 2024.
9
Local Rank Inference for Varying Coefficient Models.变系数模型的局部秩推断
J Am Stat Assoc. 2009 Dec 1;104(488):1631-1645. doi: 10.1198/jasa.2009.tm09055.
10
Multivariate correlation estimator for inferring functional relationships from replicated genome-wide data.用于从复制的全基因组数据推断功能关系的多变量相关估计器。
Bioinformatics. 2007 Sep 1;23(17):2298-305. doi: 10.1093/bioinformatics/btm328. Epub 2007 Jun 22.

引用本文的文献

1
Categorization of 34 computational methods to detect spatially variable genes from spatially resolved transcriptomics data.用于从空间转录组学数据中检测空间可变基因的34种计算方法的分类。
Nat Commun. 2025 Jan 29;16(1):1141. doi: 10.1038/s41467-025-56080-w.
2
Categorization of 33 computational methods to detect spatially variable genes from spatially resolved transcriptomics data.对33种从空间转录组学数据中检测空间可变基因的计算方法进行分类。
ArXiv. 2024 Oct 3:arXiv:2405.18779v4.

本文引用的文献

1
A semiparametric kernel independence test with application to mutational signatures.一种应用于突变特征的半参数核独立性检验。
J Am Stat Assoc. 2021;116(536):1648-1661. doi: 10.1080/01621459.2020.1871357. Epub 2021 Feb 16.
2
scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling.scPNMF:稀疏的单细胞基因编码,以方便选择用于靶向基因分析的基因。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i358-i366. doi: 10.1093/bioinformatics/btab273.
3
Pancreatic β-cell heterogeneity in health and diabetes: classes, sources, and subtypes.
健康与糖尿病状态下胰腺β细胞的异质性:类别、来源及亚型
Am J Physiol Endocrinol Metab. 2021 Apr 1;320(4):E716-E731. doi: 10.1152/ajpendo.00649.2020. Epub 2021 Feb 15.
4
Adaptive Mixtures of Local Experts.局部专家的自适应混合模型
Neural Comput. 1991 Spring;3(1):79-87. doi: 10.1162/neco.1991.3.1.79.
5
Generalized R-squared for detecting dependence.用于检测依赖性的广义决定系数。
Biometrika. 2017 Mar;104(1):129-139. doi: 10.1093/biomet/asw071. Epub 2017 Feb 22.
6
A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.人类和小鼠胰腺的单细胞转录组图谱揭示了细胞间和细胞内的群体结构。
Cell Syst. 2016 Oct 26;3(4):346-360.e4. doi: 10.1016/j.cels.2016.08.011. Epub 2016 Sep 22.
7
Gene coexpression measures in large heterogeneous samples using count statistics.使用计数统计量在大型异质样本中进行基因共表达测量。
Proc Natl Acad Sci U S A. 2014 Nov 18;111(46):16371-6. doi: 10.1073/pnas.1417128111. Epub 2014 Oct 6.
8
Using biologically interrelated experiments to identify pathway genes in Arabidopsis.利用具有生物学关联性的实验来鉴定拟南芥中的通路基因。
Bioinformatics. 2012 Mar 15;28(6):815-22. doi: 10.1093/bioinformatics/bts038. Epub 2012 Jan 23.
9
Detecting novel associations in large data sets.在大型数据集 中检测新的关联。
Science. 2011 Dec 16;334(6062):1518-24. doi: 10.1126/science.1205438.
10
On Brownian Distance Covariance and High Dimensional Data.关于布朗距离协方差与高维数据
Ann Appl Stat. 2009 Jan 1;3(4):1266-1269. doi: 10.1214/09-AOAS312.