具有先验生物学知识的模糊c均值聚类

Fuzzy c-means clustering with prior biological knowledge.

作者信息

Tari Luis, Baral Chitta, Kim Seungchan

机构信息

School of Computing and Informatics, Department of Computer Science and Engineering, Ira A. Fulton School of Engineering, Arizona State University, P.O. Box 878809, Tempe, AZ 85287-8809, USA.

出版信息

J Biomed Inform. 2009 Feb;42(1):74-81. doi: 10.1016/j.jbi.2008.05.009. Epub 2008 May 24.

DOI:10.1016/j.jbi.2008.05.009

PMID:18595779

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2673503/

Abstract

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.

摘要

我们提出了一种名为GO模糊c均值的新型半监督聚类方法，该方法能够在概率聚类算法中同时利用生物知识和基因表达数据。我们的方法基于模糊c均值聚类算法，并利用基因本体注释作为先验知识来指导功能相关基因的分组过程。与传统聚类方法不同，我们的方法能够将基因分配到多个簇中，这更恰当地表示了基因的行为。应用两个酵母（酿酒酵母）表达谱数据集将我们的方法与其他最先进的聚类方法进行比较。我们的实验表明，即使仅使用一小部分基因本体注释，我们的方法也能产生更具生物学意义的簇。此外，我们的实验进一步表明，我们方法中先验知识的利用能够有效地预测基因功能。源代码可在http://sysbio.fulton.asu.edu/gofuzzy/免费获取。

相似文献

Fuzzy c-means clustering with prior biological knowledge.具有先验生物学知识的模糊c均值聚类

J Biomed Inform. 2009 Feb;42(1):74-81. doi: 10.1016/j.jbi.2008.05.009. Epub 2008 May 24.

Incorporating gene ontology into fuzzy relational clustering of microarray gene expression data.将基因本体论纳入微阵列基因表达数据的模糊关系聚类中。

Biosystems. 2018 Jan;163:1-10. doi: 10.1016/j.biosystems.2017.09.017. Epub 2017 Nov 4.

Analysis of expression profile using fuzzy adaptive resonance theory.基于模糊自适应共振理论的表达谱分析。

Bioinformatics. 2002 Aug;18(8):1073-83. doi: 10.1093/bioinformatics/18.8.1073.

Rough-fuzzy clustering for grouping functionally similar genes from microarray data.基于粗糙模糊聚类的基因功能相似性分组方法研究

IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):286-99. doi: 10.1109/TCBB.2012.103.

A new validity measure for a correlation-based fuzzy c-means clustering algorithm.一种基于相关性的模糊 c 均值聚类算法的新有效性度量。

Annu Int Conf IEEE Eng Med Biol Soc. 2009;2009:3865-8. doi: 10.1109/IEMBS.2009.5332582.

Microarray data mining using landmark gene-guided clustering.使用标志性基因引导聚类的微阵列数据挖掘

BMC Bioinformatics. 2008 Feb 11;9:92. doi: 10.1186/1471-2105-9-92.

Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。

BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data.用于癌症数据聚类分析的自适应模糊共识聚类框架

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):887-901. doi: 10.1109/TCBB.2014.2359433.

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.FLAME，一种用于分析DNA微阵列数据的新型模糊聚类方法。

BMC Bioinformatics. 2007 Jan 4;8:3. doi: 10.1186/1471-2105-8-3.

引用本文的文献

Robust and rigorous identification of tissue-specific genes by statistically extending tau score.通过统计学扩展tau评分来稳健且严格地鉴定组织特异性基因。

BioData Min. 2022 Dec 9;15(1):31. doi: 10.1186/s13040-022-00315-9.

An improved Fuzzy based GWO algorithm for predicting the potential host receptor of COVID-19 infection.基于改进的模糊灰狼优化算法预测 COVID-19 感染的潜在宿主受体。

Comput Biol Med. 2022 Dec;151(Pt A):106050. doi: 10.1016/j.compbiomed.2022.106050. Epub 2022 Aug 25.

Integrative clustering methods for multi-omics data.多组学数据的整合聚类方法。

Wiley Interdiscip Rev Comput Stat. 2022 May-Jun;14(3). doi: 10.1002/wics.1553. Epub 2021 Feb 7.

A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies.一种由先验生物学知识引导的多目标基因聚类算法，具备强化和多样化策略。

BioData Min. 2018 Aug 7;11:16. doi: 10.1186/s13040-018-0178-4. eCollection 2018.

An unsupervised machine learning method for discovering patient clusters based on genetic signatures.基于遗传特征的无监督机器学习方法发现患者聚类。

J Biomed Inform. 2018 Sep;85:30-39. doi: 10.1016/j.jbi.2018.07.004. Epub 2018 Jul 29.

Applications of Bayesian network models in predicting types of hematological malignancies.贝叶斯网络模型在预测血液系统恶性肿瘤类型中的应用。

Sci Rep. 2018 May 3;8(1):6951. doi: 10.1038/s41598-018-24758-5.

Meta-analysis of cell- specific transcriptomic data using fuzzy c-means clustering discovers versatile viral responsive genes.使用模糊c均值聚类对细胞特异性转录组数据进行荟萃分析，发现了多种病毒反应基因。

BMC Bioinformatics. 2017 Jun 6;18(1):295. doi: 10.1186/s12859-017-1669-x.

Semi-Supervised Fuzzy Clustering with Feature Discrimination.具有特征区分的半监督模糊聚类

PLoS One. 2015 Sep 1;10(9):e0131160. doi: 10.1371/journal.pone.0131160. eCollection 2015.

Integrative clustering methods for high-dimensional molecular data.用于高维分子数据的整合聚类方法

Transl Cancer Res. 2014 Jun 1;3(3):202-216. doi: 10.3978/j.issn.2218-676X.2014.06.03.

Semi-supervised consensus clustering for gene expression data analysis.基于半监督共识聚类的基因表达数据分析。

BioData Min. 2014 May 8;7:7. doi: 10.1186/1756-0381-7-7. eCollection 2014.

本文引用的文献

Microarray data mining using landmark gene-guided clustering.使用标志性基因引导聚类的微阵列数据挖掘

BMC Bioinformatics. 2008 Feb 11;9:92. doi: 10.1186/1471-2105-9-92.

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.FLAME，一种用于分析DNA微阵列数据的新型模糊聚类方法。

BMC Bioinformatics. 2007 Jan 4;8:3. doi: 10.1186/1471-2105-8-3.

Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps.使用自组织映射对酿酒酵母的基因表达数据和基因本体术语进行共聚类和可视化。

J Biomed Inform. 2007 Apr;40(2):160-73. doi: 10.1016/j.jbi.2006.05.001. Epub 2006 May 20.

Systematic identification and functional screens of uncharacterized proteins associated with eukaryotic ribosomal complexes.与真核核糖体复合物相关的未表征蛋白质的系统鉴定和功能筛选。

Genes Dev. 2006 May 15;20(10):1294-307. doi: 10.1101/gad.1422006.

A functional network involved in the recycling of nucleocytoplasmic pre-60S factors.一个参与核质前60S因子循环利用的功能网络。

J Cell Biol. 2006 May 8;173(3):349-60. doi: 10.1083/jcb.200510080. Epub 2006 May 1.

Combining gene annotations and gene expression data in model-based clustering: weighted method.基于模型的聚类中基因注释与基因表达数据的结合：加权方法。

OMICS. 2006 Spring;10(1):28-39. doi: 10.1089/omi.2006.10.28.

Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data.将生物学知识融入基于距离的微阵列基因表达数据聚类分析中。

Bioinformatics. 2006 May 15;22(10):1259-68. doi: 10.1093/bioinformatics/btl065. Epub 2006 Feb 24.

Incorporating gene functions as priors in model-based clustering of microarray gene expression data.在基于模型的微阵列基因表达数据聚类中纳入基因功能作为先验信息。

Bioinformatics. 2006 Apr 1;22(7):795-801. doi: 10.1093/bioinformatics/btl011. Epub 2006 Jan 24.

The novel ATP-binding cassette protein ARB1 is a shuttling factor that stimulates 40S and 60S ribosome biogenesis.新型ATP结合盒蛋白ARB1是一种穿梭因子，可刺激40S和60S核糖体的生物合成。

Mol Cell Biol. 2005 Nov;25(22):9859-73. doi: 10.1128/MCB.25.22.9859-9873.2005.

The essential WD-repeat protein Rsa4p is required for rRNA processing and intra-nuclear transport of 60S ribosomal subunits.必需的WD重复蛋白Rsa4p是rRNA加工和60S核糖体亚基的核内运输所必需的。

Nucleic Acids Res. 2005 Oct 12;33(18):5728-39. doi: 10.1093/nar/gki887. Print 2005.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验