Suppr超能文献

使用惩罚加权归一化割算法对基因表达数据进行重叠聚类

Overlapping clustering of gene expression data using penalized weighted normalized cut.

作者信息

Teran Hidalgo Sebastian J, Zhu Tingyu, Wu Mengyun, Ma Shuangge

机构信息

Department of Biostatistics, Yale University, New Haven, Connecticut.

Department of Statistics, Xiamen University, Xiamen, China.

出版信息

Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9.

Abstract

Clustering has been widely conducted in the analysis of gene expression data. For complex diseases, it has played an important role in identifying unknown functions of genes, serving as the basis of other analysis, and others. A common limitation of most existing clustering approaches is to assume that genes are separated into disjoint clusters. As genes often have multiple functions and thus can belong to more than one functional cluster, the disjoint clustering results can be unsatisfactory. In addition, due to the small sample sizes of genetic profiling studies and other factors, there may not be sufficient evidence to confirm the specific functions of some genes and cluster them definitively into disjoint clusters. In this study, we develop an effective overlapping clustering approach, which takes account into the multiplicity of gene functions and lack of certainty in practical analysis. A penalized weighted normalized cut (PWNCut) criterion is proposed based on the NCut technique and an norm constraint. It outperforms multiple competitors in simulation. The analysis of the cancer genome atlas (TCGA) data on breast cancer and cervical cancer leads to biologically sensible findings which differ from those using the alternatives. To facilitate implementation, we develop the function pwncut in the R package NCutYX.

摘要

聚类已广泛应用于基因表达数据分析中。对于复杂疾病,它在识别基因的未知功能、作为其他分析的基础等方面发挥了重要作用。大多数现有聚类方法的一个常见局限性是假设基因被划分为不相交的簇。由于基因通常具有多种功能,因此可以属于多个功能簇,不相交的聚类结果可能并不理想。此外,由于基因谱研究的样本量较小以及其他因素,可能没有足够的证据来确认某些基因的特定功能并将它们明确地聚类到不相交的簇中。在本研究中,我们开发了一种有效的重叠聚类方法,该方法考虑了基因功能的多样性以及实际分析中缺乏确定性的问题。基于NCut技术和一个范数约束,提出了一种惩罚加权归一化割(PWNCut)准则。在模拟中,它优于多个竞争对手。对癌症基因组图谱(TCGA)中乳腺癌和宫颈癌数据的分析得出了与使用其他方法不同的具有生物学意义的结果。为便于实现,我们在R包NCutYX中开发了函数pwncut。

相似文献

1
Overlapping clustering of gene expression data using penalized weighted normalized cut.
Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9.
2
Clustering multilayer omics data using MuNCut.
BMC Genomics. 2018 Mar 14;19(1):198. doi: 10.1186/s12864-018-4580-6.
3
Assisted gene expression-based clustering with AWNCut.
Stat Med. 2018 Dec 20;37(29):4386-4403. doi: 10.1002/sim.7928. Epub 2018 Aug 9.
5
A novel approach for discovering overlapping clusters in gene expression data.
IEEE Trans Biomed Eng. 2009 Jul;56(7):1803-9. doi: 10.1109/TBME.2009.2015055. Epub 2009 Feb 20.
6
Assisted clustering of gene expression data using ANCut.
BMC Genomics. 2017 Aug 16;18(1):623. doi: 10.1186/s12864-017-3990-1.
7
Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):657-70. doi: 10.1109/TCBB.2013.59.
9
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters.
PLoS One. 2016 Mar 25;11(3):e0152333. doi: 10.1371/journal.pone.0152333. eCollection 2016.
10
Clusternomics: Integrative context-dependent clustering for heterogeneous datasets.
PLoS Comput Biol. 2017 Oct 16;13(10):e1005781. doi: 10.1371/journal.pcbi.1005781. eCollection 2017 Oct.

引用本文的文献

1
Vertical integration methods for gene expression data analysis.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa169.
3
A Selective Review of Multi-Level Omics Data Integration Using Variable Selection.
High Throughput. 2019 Jan 18;8(1):4. doi: 10.3390/ht8010004.

本文引用的文献

1
Clustering multilayer omics data using MuNCut.
BMC Genomics. 2018 Mar 14;19(1):198. doi: 10.1186/s12864-018-4580-6.
2
Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data.
Biosystems. 2018 Jan;163:1-10. doi: 10.1016/j.biosystems.2017.09.017. Epub 2017 Nov 4.
3
Assisted clustering of gene expression data using ANCut.
BMC Genomics. 2017 Aug 16;18(1):623. doi: 10.1186/s12864-017-3990-1.
4
On the Use of Self-Organizing Map for Text Clustering in Engineering Change Process Analysis: A Case Study.
Comput Intell Neurosci. 2016;2016:5139574. doi: 10.1155/2016/5139574. Epub 2016 Dec 4.
5
Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data.
Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.
6
Comparing the performance of biomedical clustering methods.
Nat Methods. 2015 Nov;12(11):1033-8. doi: 10.1038/nmeth.3583. Epub 2015 Sep 21.
7
Stromal gene expression defines poor-prognosis subtypes in colorectal cancer.
Nat Genet. 2015 Apr;47(4):320-9. doi: 10.1038/ng.3225. Epub 2015 Feb 23.
8
Gene Ontology Consortium: going forward.
Nucleic Acids Res. 2015 Jan;43(Database issue):D1049-56. doi: 10.1093/nar/gku1179. Epub 2014 Nov 26.
9
Integrative analysis of multiple cancer genomic datasets under the heterogeneity model.
Stat Med. 2013 Sep 10;32(20):3509-21. doi: 10.1002/sim.5780. Epub 2013 Mar 21.
10
On the classification of microarray gene-expression data.
Brief Bioinform. 2013 Jul;14(4):402-10. doi: 10.1093/bib/bbs056. Epub 2012 Sep 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验