基于广义相关性的高通量表达数据的层次聚类。

Hierarchical clustering of high-throughput expression data based on general dependences.

机构信息

Emory University, Atlanta.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):1080-5. doi: 10.1109/TCBB.2013.99.

DOI:10.1109/TCBB.2013.99

PMID:24334400

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3905248/

Abstract

High-throughput expression technologies, including gene expression array and liquid chromatography--mass spectrometry (LC-MS) and so on, measure thousands of features, i.e., genes or metabolites, on a continuous scale. In such data, both linear and nonlinear relations exist between features. Nonlinear relations can reflect critical regulation patterns in the biological system. However, they are not identified and utilized by traditional clustering methods based on linear associations. Clustering based on general dependences, i.e., both linear and nonlinear relations, is hampered by the high dimensionality and high noise level of the data. We developed a sensitive nonparametric measure of general dependence between (groups of) random variables in high dimensions. Based on this dependence measure, we developed a hierarchical clustering method. In simulation studies, the method outperformed correlation- and mutual information (MI)-based hierarchical clustering methods in clustering features with nonlinear dependences. We applied the method to a microarray data set measuring the gene expression in cell-cycle time series to show it generates biologically relevant results. The R code is available at http://userwww.service.emory.edu/~tyu8/GDHC.

摘要

高通量表达技术，包括基因表达阵列和液相色谱-质谱（LC-MS）等，可在连续尺度上测量数千个特征，即基因或代谢物。在这些数据中，特征之间存在线性和非线性关系。非线性关系可以反映生物系统中的关键调节模式。然而，基于线性关联的传统聚类方法无法识别和利用这些关系。基于广义依赖性（即线性和非线性关系）的聚类受到数据的高维性和高噪声水平的限制。我们开发了一种在高维中测量（组）随机变量之间广义相关性的敏感非参数度量方法。基于这个依赖度量，我们开发了一种层次聚类方法。在模拟研究中，该方法在聚类具有非线性依赖关系的特征方面优于基于相关性和互信息（MI）的层次聚类方法。我们将该方法应用于测量细胞周期时间序列中基因表达的微阵列数据集，以显示其产生的生物学相关结果。R 代码可在 http://userwww.service.emory.edu/~tyu8/GDHC 获得。

相似文献

Hierarchical clustering of high-throughput expression data based on general dependences.

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):1080-5. doi: 10.1109/TCBB.2013.99.

K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data.

Biomed Res Int. 2015;2015:918954. doi: 10.1155/2015/918954. Epub 2015 Aug 3.

Comparisons and validation of statistical clustering techniques for microarray gene expression data.

Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.

A mathematical and computational framework for quantitative comparison and integration of large-scale gene expression data.

Nucleic Acids Res. 2005 May 10;33(8):2580-94. doi: 10.1093/nar/gki536. Print 2005.

Nonlinear Network Reconstruction from Gene Expression Data Using Marginal Dependencies Measured by DCOL.

PLoS One. 2016 Jul 5;11(7):e0158247. doi: 10.1371/journal.pone.0158247. eCollection 2016.

Kernel hierarchical gene clustering from microarray expression data.

Bioinformatics. 2003 Nov 1;19(16):2097-104. doi: 10.1093/bioinformatics/btg288.

Model-based clustering for RNA-seq data.

Bioinformatics. 2014 Jan 15;30(2):197-205. doi: 10.1093/bioinformatics/btt632. Epub 2013 Nov 4.

Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.

Bioinformatics. 2004 Nov 1;20(16):2534-44. doi: 10.1093/bioinformatics/bth280. Epub 2004 Apr 29.

Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

BMC Bioinformatics. 2013 Aug 20;14:252. doi: 10.1186/1471-2105-14-252.

Biologically supervised hierarchical clustering algorithms for gene expression data.

Conf Proc IEEE Eng Med Biol Soc. 2006;2006:5515-8. doi: 10.1109/IEMBS.2006.260308.

引用本文的文献

A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa190.

Estimating Linear and Nonlinear Gene Coexpression Networks by Semiparametric Neighborhood Selection.

Genetics. 2020 Jul;215(3):597-607. doi: 10.1534/genetics.120.303186. Epub 2020 May 15.

Nonlinear variable selection with continuous outcome: a fully nonparametric incremental forward stagewise approach.

Stat Anal Data Min. 2018 Aug;11(4):188-197. doi: 10.1002/sam.11381. Epub 2018 Jun 19.

Identifying heterogeneous subtypes of gastric cancer and subtype‑specific subpaths of microRNA‑target pathways.

Mol Med Rep. 2018 Mar;17(3):3583-3590. doi: 10.3892/mmr.2017.8329. Epub 2017 Dec 20.

Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations.

Bioinformatics. 2018 May 1;34(9):1555-1561. doi: 10.1093/bioinformatics/btx816.

Nonlinear Network Reconstruction from Gene Expression Data Using Marginal Dependencies Measured by DCOL.

PLoS One. 2016 Jul 5;11(7):e0158247. doi: 10.1371/journal.pone.0158247. eCollection 2016.

K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data.

Biomed Res Int. 2015;2015:918954. doi: 10.1155/2015/918954. Epub 2015 Aug 3.

本文引用的文献

ROCS: receiver operating characteristic surface for class-skewed high-throughput data.

PLoS One. 2012;7(7):e40598. doi: 10.1371/journal.pone.0040598. Epub 2012 Jul 6.

Detecting novel associations in large data sets.

Science. 2011 Dec 16;334(6062):1518-24. doi: 10.1126/science.1205438.

Saccharomyces Genome Database: the genomics resource of budding yeast.

Nucleic Acids Res. 2012 Jan;40(Database issue):D700-5. doi: 10.1093/nar/gkr1029. Epub 2011 Nov 21.

Improving gene expression data interpretation by finding latent factors that co-regulate gene modules with clinical factors.

BMC Genomics. 2011 Nov 16;12:563. doi: 10.1186/1471-2164-12-563.

Capturing changes in gene expression dynamics by gene set differential coordination analysis.

Genomics. 2011 Dec;98(6):469-77. doi: 10.1016/j.ygeno.2011.09.001. Epub 2011 Sep 24.

A general framework for analyzing data from two short time-series microarray experiments.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):14-26. doi: 10.1109/TCBB.2009.51.

A practical approach to detect unique metabolic patterns for personalized medicine.

Analyst. 2010 Nov;135(11):2864-70. doi: 10.1039/c0an00333f. Epub 2010 Sep 13.

An exploratory data analysis method to reveal modular latent structures in high-throughput data.

BMC Bioinformatics. 2010 Aug 27;11:440. doi: 10.1186/1471-2105-11-440.

Incorporating Nonlinear Relationships in Microarray Missing Value Imputation.

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):723-31. doi: 10.1109/TCBB.2010.73.

An overview of clustering applied to molecular biology.

Methods Mol Biol. 2010;620:369-404. doi: 10.1007/978-1-60761-580-4_12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于广义相关性的高通量表达数据的层次聚类。

Hierarchical clustering of high-throughput expression data based on general dependences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献