Suppr超能文献

从基因表达数据中高效挖掘有判别力的共聚类

Efficient Mining of Discriminative Co-clusters from Gene Expression Data.

作者信息

Odibat Omar, Reddy Chandan K

机构信息

Department of Computer Science, Wayne State University, Detroit, MI, 48202.

出版信息

Knowl Inf Syst. 2014 Dec;41(3):667-696. doi: 10.1007/s10115-013-0684-0.

Abstract

Discriminative models are used to analyze the differences between two classes and to identify class-specific patterns. Most of the existing discriminative models depend on using the entire feature space to compute the discriminative patterns for each class. Co-clustering has been proposed to capture the patterns that are correlated in a subset of features, but it cannot handle discriminative patterns in labeled datasets. In certain biological applications such as gene expression analysis, it is critical to consider the discriminative patterns that are correlated only in a subset of the feature space. The objective of this paper is two-fold: first, it presents an algorithm to efficiently find arbitrarily positioned co-clusters from complex data. Second, it extends this co-clustering algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. In addition, we also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace pattern mining algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.

摘要

判别模型用于分析两类之间的差异并识别特定类别的模式。大多数现有的判别模型依赖于使用整个特征空间来计算每个类别的判别模式。协同聚类已被提出用于捕获在特征子集中相关的模式,但它无法处理标记数据集中的判别模式。在某些生物应用中,如基因表达分析,考虑仅在特征空间子集中相关的判别模式至关重要。本文的目标有两个:首先,提出一种算法,用于从复杂数据中高效地找到任意位置的协同聚类。其次,通过将类信息纳入协同聚类搜索过程,扩展此协同聚类算法以发现判别协同聚类。此外,我们还对判别协同聚类进行了表征,并提出了三种新颖的度量,可用于评估任何判别子空间模式挖掘算法的性能。我们在几个合成和真实的基因表达数据集上评估了所提出的算法,实验结果表明所提出的算法优于文献中现有的几种算法。

相似文献

4
Subspace Weighting Co-Clustering of Gene Expression Data.基于基因表达数据的子空间加权协同聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):352-364. doi: 10.1109/TCBB.2017.2705686. Epub 2017 May 18.
5
Efficiently mining time-delayed gene expression patterns.高效挖掘时间延迟基因表达模式。
IEEE Trans Syst Man Cybern B Cybern. 2010 Apr;40(2):400-11. doi: 10.1109/TSMCB.2009.2025564. Epub 2009 Oct 30.
6
Unsupervised fuzzy pattern discovery in gene expression data.基于基因表达数据的无监督模糊模式发现。
BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2105-12-S5-S5. Epub 2011 Jul 27.
7
Discriminative Feature Selection for Uncertain Graph Classification.用于不确定图分类的判别特征选择
Proc SIAM Int Conf Data Min. 2013;2013:82-93. doi: 10.1137/1.9781611972832.10.

本文引用的文献

7
Biclustering algorithms for biological data analysis: a survey.用于生物数据分析的双聚类算法:一项综述。
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.
9
Biclustering in gene expression data by tendency.基于趋势的基因表达数据双聚类分析
Proc IEEE Comput Syst Bioinform Conf. 2004:182-93. doi: 10.1109/csb.2004.1332431.
10
Defining transcription modules using large-scale gene expression data.利用大规模基因表达数据定义转录模块。
Bioinformatics. 2004 Sep 1;20(13):1993-2003. doi: 10.1093/bioinformatics/bth166. Epub 2004 Mar 25.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验