基于随机投影的模糊集成聚类用于DNA微阵列数据分析

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.

作者信息

Avogadri Roberto, Valentini Giorgio

机构信息

DSI, Dipartimento di Scienze dell' Informazione, Università degli Studi di Milano, Via Comelico 39, 20135 Milano, Italy.

出版信息

Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.

DOI:10.1016/j.artmed.2008.07.014

PMID:18801650

Abstract

OBJECTIVE

Two major problems related the unsupervised analysis of gene expression data are represented by the accuracy and reliability of the discovered clusters, and by the biological fact that the boundaries between classes of patients or classes of functionally related genes are sometimes not clearly defined. The main goal of this work consists in the exploration of new strategies and in the development of new clustering methods to improve the accuracy and robustness of clustering results, taking into account the uncertainty underlying the assignment of examples to clusters in the context of gene expression data analysis.

METHODOLOGY

We propose a fuzzy ensemble clustering approach both to improve the accuracy of clustering results and to take into account the inherent fuzziness of biological and bio-medical gene expression data. We applied random projections that obey the Johnson-Lindenstrauss lemma to obtain several instances of lower dimensional gene expression data from the original high-dimensional ones, approximately preserving the information and the metric structure of the original data. Then we adopt a double fuzzy approach to obtain a consensus ensemble clustering, by first applying a fuzzy k-means algorithm to the different instances of the projected low-dimensional data and then by using a fuzzy t-norm to combine the multiple clusterings. Several variants of the fuzzy ensemble clustering algorithms are proposed, according to different techniques to combine the base clusterings and to obtain the final consensus clustering.

RESULTS AND CONCLUSION

We applied our proposed fuzzy ensemble methods to the gene expression analysis of leukemia, lymphoma, adenocarcinoma and melanoma patients, and we compared the results with other state of the art ensemble methods. Results show that in some cases, taking into account the natural fuzziness of the data, we can improve the discovery of classes of patients defined at bio-molecular level. The reduction of the dimension of the data, achieved through random projections techniques, is well-suited to the characteristics of high-dimensional gene expression data, thus resulting in improved performance with respect to single fuzzy k-means and with respect to ensemble methods based on resampling techniques. Moreover, we show that the analysis of the accuracy and diversity of the base fuzzy clusterings can be useful to explain the advantages and the limitations of the proposed fuzzy ensemble approach.

摘要

目的

基因表达数据的无监督分析存在两个主要问题，一是所发现聚类的准确性和可靠性，二是生物学事实，即患者类别或功能相关基因类别之间的界限有时并不明确。这项工作的主要目标在于探索新策略并开发新的聚类方法，以提高聚类结果的准确性和稳健性，同时考虑到在基因表达数据分析中示例分配到聚类时存在的不确定性。

方法

我们提出一种模糊集成聚类方法，既能提高聚类结果的准确性，又能考虑到生物和生物医学基因表达数据固有的模糊性。我们应用服从约翰逊 - 林登施特劳斯引理的随机投影，从原始高维基因表达数据中获取几个低维数据实例，近似保留原始数据的信息和度量结构。然后我们采用双重模糊方法来获得一个一致的集成聚类，首先对投影后的低维数据的不同实例应用模糊k均值算法，然后使用模糊t范数来组合多个聚类。根据组合基础聚类和获得最终一致聚类的不同技术，提出了模糊集成聚类算法的几种变体。

结果与结论

我们将所提出的模糊集成方法应用于白血病、淋巴瘤、腺癌和黑色素瘤患者的基因表达分析，并将结果与其他现有集成方法进行比较。结果表明，在某些情况下，考虑到数据的自然模糊性，我们可以改进在生物分子水平上定义的患者类别的发现。通过随机投影技术实现的数据降维非常适合高维基因表达数据的特征，因此相对于单一模糊k均值和基于重采样技术的集成方法，性能有所提高。此外，我们表明对基础模糊聚类的准确性和多样性进行分析有助于解释所提出的模糊集成方法的优点和局限性。

相似文献

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.基于随机投影的模糊集成聚类用于DNA微阵列数据分析

Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.

Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses.用于评估DNA微阵列数据分析中患者聚类可靠性的随机图谱。

Artif Intell Med. 2006 Jun;37(2):85-109. doi: 10.1016/j.artmed.2006.03.005. Epub 2006 May 23.

Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering.用于估计模糊k均值聚类中最优聚类数的改进模糊间隙统计量

J Biosci Bioeng. 2008 Mar;105(3):273-81. doi: 10.1263/jbb.105.273.

Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps.基于特征过滤方法和扩散映射的高维基因表达数据聚类。

Artif Intell Med. 2010 Feb-Mar;48(2-3):91-8. doi: 10.1016/j.artmed.2009.06.001. Epub 2009 Dec 4.

Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

LCE: a link-based cluster ensemble method for improved gene expression data analysis.LCE：一种基于链接的聚类集成方法，用于改进基因表达数据分析。

Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.

Techniques for clustering gene expression data.基因表达数据聚类技术。

Comput Biol Med. 2008 Mar;38(3):283-93. doi: 10.1016/j.compbiomed.2007.11.001. Epub 2007 Dec 3.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类：采用TSD预聚类的FCV法

Appl Bioinformatics. 2003;2(1):35-45.

Knowledge based cluster ensemble for cancer discovery from biomolecular data.基于知识的聚类集成在生物分子数据中的癌症发现。

IEEE Trans Nanobioscience. 2011 Jun;10(2):76-85. doi: 10.1109/TNB.2011.2144997. Epub 2011 Jul 7.

引用本文的文献

Phenotype clustering in health care: A narrative review for clinicians.医疗保健中的表型聚类：给临床医生的叙述性综述

Front Artif Intell. 2022 Aug 12;5:842306. doi: 10.3389/frai.2022.842306. eCollection 2022.

Unsupervised Algorithms for Microarray Sample Stratification.非监督算法在微阵列样本分层中的应用。

Methods Mol Biol. 2022;2401:121-146. doi: 10.1007/978-1-0716-1839-4_9.

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.

Prediction of slaughter age in pigs and assessment of the predictive value of phenotypic and genetic information using random forest.利用随机森林预测猪的屠宰年龄，并评估表型和遗传信息的预测价值。

J Anim Sci. 2018 Dec 3;96(12):4935-4943. doi: 10.1093/jas/sky359.

Clustering cancer gene expression data by projective clustering ensemble.通过投影聚类集成对癌症基因表达数据进行聚类

PLoS One. 2017 Feb 24;12(2):e0171429. doi: 10.1371/journal.pone.0171429. eCollection 2017.

Interpolation based consensus clustering for gene expression time series.基于插值的基因表达时间序列一致性聚类

BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0.

Ensemble-based prediction of RNA secondary structures.基于集成的 RNA 二级结构预测。

BMC Bioinformatics. 2013 Apr 24;14:139. doi: 10.1186/1471-2105-14-139.

A new exact test for the evaluation of population pharmacokinetic and/or pharmacodynamic models using random projections.一种新的精确检验方法，用于评估基于随机投影的群体药代动力学和/或药效学模型。

Pharm Res. 2011 Aug;28(8):1948-62. doi: 10.1007/s11095-011-0422-9. Epub 2011 Apr 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于随机投影的模糊集成聚类用于DNA微阵列数据分析

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.

作者信息

机构信息

出版信息

OBJECTIVE

METHODOLOGY

RESULTS AND CONCLUSION

目的

方法

结果与结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献