• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于聚类的高效、非线性相关系数。

An efficient, not-only-linear correlation coefficient based on clustering.

机构信息

Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

出版信息

Cell Syst. 2024 Sep 18;15(9):854-868.e3. doi: 10.1016/j.cels.2024.08.005. Epub 2024 Sep 6.

DOI:10.1016/j.cels.2024.08.005
PMID:39243756
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11951854/
Abstract

Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper's transparent peer review process is included in the supplemental information.

摘要

识别数据中的有意义模式对于理解复杂的生物学过程至关重要,特别是在转录组学中,具有相关表达的基因通常具有相似的功能或有助于疾病机制。传统的相关系数主要捕捉线性关系,可能会忽略重要的非线性模式。我们引入了 clustermatch 相关系数(CCC),这是一种不仅线性的系数,它利用聚类来有效地检测线性和非线性关联。CCC 通过揭示线性系数遗漏的生物学有意义的模式,并且比最先进的系数(如最大信息系数)更快,从而优于标准方法。当应用于来自基因型-组织表达(GTEx)的人类基因表达数据时,CCC 识别出了稳健的线性关系和非线性模式,例如性别特异性差异,这是标准方法无法检测到的。排名靠前的基因对在从蛋白质-蛋白质相互作用、转录因子调控以及化学和遗传扰动构建的综合网络中的相互作用富集,表明 CCC 可以检测到线性方法遗漏的功能关系。CCC 是一种高效、下一代、不仅线性的基因组规模数据相关系数。本文的透明同行评审过程记录包含在补充信息中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/636106cf5112/nihms-2023070-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/dfb2de8ac869/nihms-2023070-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/f8b483255e06/nihms-2023070-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/01c3744e1c09/nihms-2023070-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/122c90331625/nihms-2023070-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/c0a9ee6aee98/nihms-2023070-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/54d2601e1988/nihms-2023070-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/636106cf5112/nihms-2023070-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/dfb2de8ac869/nihms-2023070-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/f8b483255e06/nihms-2023070-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/01c3744e1c09/nihms-2023070-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/122c90331625/nihms-2023070-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/c0a9ee6aee98/nihms-2023070-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/54d2601e1988/nihms-2023070-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75d0/11951854/636106cf5112/nihms-2023070-f0007.jpg

相似文献

1
An efficient, not-only-linear correlation coefficient based on clustering.一种基于聚类的高效、非线性相关系数。
Cell Syst. 2024 Sep 18;15(9):854-868.e3. doi: 10.1016/j.cels.2024.08.005. Epub 2024 Sep 6.
2
MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data.MICRAT:一种使用时间序列基因表达数据推断基因调控网络的新算法。
BMC Syst Biol. 2018 Dec 14;12(Suppl 7):115. doi: 10.1186/s12918-018-0635-1.
3
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.使用收缩相关系数对重复微阵列进行全基因组规模的聚类分析。
BMC Bioinformatics. 2008 Jun 18;9:288. doi: 10.1186/1471-2105-9-288.
4
Bayesian integrative analysis of epigenomic and transcriptomic data identifies Alzheimer's disease candidate genes and networks.贝叶斯综合分析表观基因组和转录组数据,确定阿尔茨海默病候选基因和网络。
PLoS Comput Biol. 2020 Apr 7;16(4):e1007771. doi: 10.1371/journal.pcbi.1007771. eCollection 2020 Apr.
5
Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions.超越共表达关系:时移和反向基因表达谱的局部聚类可识别新的生物学相关相互作用。
J Mol Biol. 2001 Dec 14;314(5):1053-66. doi: 10.1006/jmbi.2000.5219.
6
scCompressSA: dual-channel self-attention based deep autoencoder model for single-cell clustering by compressing gene-gene interactions.scCompressSA:基于双通道自注意力的深度自动编码器模型,通过压缩基因-基因相互作用实现单细胞聚类。
BMC Genomics. 2024 Apr 29;25(1):423. doi: 10.1186/s12864-024-10286-2.
7
Hierarchical clustering of high-throughput expression data based on general dependences.基于广义相关性的高通量表达数据的层次聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):1080-5. doi: 10.1109/TCBB.2013.99.
8
Construction of gene networks with hybrid approach from expression profile and gene ontology.利用来自表达谱和基因本体的混合方法构建基因网络。
IEEE Trans Inf Technol Biomed. 2010 Jan;14(1):107-18. doi: 10.1109/TITB.2009.2033056. Epub 2009 Sep 29.
9
Comparison of co-expression measures: mutual information, correlation, and model based indices.比较共表达度量:互信息、相关系数和基于模型的指标。
BMC Bioinformatics. 2012 Dec 9;13:328. doi: 10.1186/1471-2105-13-328.
10
The Use of Informed Priors in Biclustering of Gene Expression with the Hierarchical Dirichlet Process.在层次狄利克雷过程中使用信息先验进行基因表达的双聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1810-1821. doi: 10.1109/TCBB.2019.2901676. Epub 2019 Feb 26.

引用本文的文献

1
LPItabformer: Enhancing generalization in predicting lncRNA-protein interactions via a tabular Transformer.LPItabformer:通过表格Transformer增强lncRNA-蛋白质相互作用预测中的泛化能力。
Comput Struct Biotechnol J. 2025 May 29;27:2323-2335. doi: 10.1016/j.csbj.2025.05.050. eCollection 2025.
2
CCC-GPU: A graphics processing unit (GPU)-optimized nonlinear correlation coefficient for large transcriptomic analyses.CCC-GPU:一种用于大型转录组分析的图形处理单元(GPU)优化的非线性相关系数。
bioRxiv. 2025 Jun 6:2025.06.03.657735. doi: 10.1101/2025.06.03.657735.
3
Genetic Studies Through the Lens of Gene Networks.

本文引用的文献

1
Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms.通过基因表达模式预测遗传关联,突出了疾病的病因和药物机制。
Nat Commun. 2023 Sep 9;14(1):5562. doi: 10.1038/s41467-023-41057-4.
2
Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics.通过整合单细胞 RNA 测序和人类遗传学来鉴定疾病关键细胞类型和细胞过程。
Nat Genet. 2022 Oct;54(10):1479-1492. doi: 10.1038/s41588-022-01187-9. Epub 2022 Sep 29.
3
Tumor-promoting mechanisms of macrophage-derived extracellular vesicles-enclosed microRNA-660 in breast cancer progression.
透过基因网络视角的遗传学研究。
Annu Rev Biomed Data Sci. 2025 Feb 20. doi: 10.1146/annurev-biodatasci-103123-095355.
4
Genetic studies through the lens of gene networks.基于基因网络视角的遗传学研究。
ArXiv. 2024 Oct 30:arXiv:2410.23425v1.
5
The effect of non-linear signal in classification problems using gene expression.基于基因表达的非线性信号在分类问题中的作用。
PLoS Comput Biol. 2023 Mar 27;19(3):e1010984. doi: 10.1371/journal.pcbi.1010984. eCollection 2023 Mar.
巨噬细胞来源的细胞外囊泡包裹的 microRNA-660 在乳腺癌进展中的促瘤机制。
Breast Cancer Res Treat. 2022 Apr;192(2):353-368. doi: 10.1007/s10549-021-06433-y. Epub 2022 Jan 27.
4
Addressing noise in co-expression network construction.解决共表达网络构建中的噪声问题。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab495.
5
recount3: summaries and queries for large-scale RNA-seq expression and splicing.recount3:大规模 RNA-seq 表达和剪接的摘要和查询。
Genome Biol. 2021 Nov 29;22(1):323. doi: 10.1186/s13059-021-02533-6.
6
Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression.大规模顺式和反式 eQTL 分析确定了数千个调节血液基因表达的遗传位点和多基因评分。
Nat Genet. 2021 Sep;53(9):1300-1310. doi: 10.1038/s41588-021-00913-z. Epub 2021 Sep 2.
7
The omnigenic model and polygenic prediction of complex traits.复杂性状的全基因组模型和多基因预测。
Am J Hum Genet. 2021 Sep 2;108(9):1558-1563. doi: 10.1016/j.ajhg.2021.07.003. Epub 2021 Jul 30.
8
An improved algorithm for the maximal information coefficient and its application.一种改进的最大信息系数算法及其应用。
R Soc Open Sci. 2021 Feb 10;8(2):201424. doi: 10.1098/rsos.201424.
9
A Fast Hybrid Feature Selection Based on Correlation-Guided Clustering and Particle Swarm Optimization for High-Dimensional Data.一种基于相关引导聚类和粒子群优化的高维数据快速混合特征选择方法
IEEE Trans Cybern. 2022 Sep;52(9):9573-9586. doi: 10.1109/TCYB.2021.3061152. Epub 2022 Aug 18.
10
Considering Sex as a Biological Variable in Basic and Clinical Studies: An Endocrine Society Scientific Statement.将性别视为基础和临床研究中的生物学变量:内分泌学会科学声明。
Endocr Rev. 2021 May 25;42(3):219-258. doi: 10.1210/endrev/bnaa034.