• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于基因表达数据分组、选择和分类的属性聚类

Attribute clustering for grouping, selection, and classification of gene expression data.

作者信息

Au Wai-Ho, Chan Keith C C, Wong Andrew K C, Wang Yang

机构信息

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):83-101. doi: 10.1109/TCBB.2005.17.

DOI:10.1109/TCBB.2005.17
PMID:17044174
Abstract

This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.

摘要

本文提出了一种属性聚类方法,该方法能够根据基因之间的相互依赖性对基因进行分组,以便从基因表达数据中挖掘有意义的模式。它可用于基因分组、选择和分类。将关系表划分为属性子组可以选择组内或组间的少量属性进行分析。通过对属性进行聚类,数据挖掘算法的搜索维度得以降低。搜索维度的降低对于基因表达数据中的数据挖掘尤为重要,因为此类数据通常由大量基因(属性)和少量基因表达谱(元组)组成。大多数数据挖掘算法通常是为适应元组数量而开发和优化的,而非属性数量。当属性数量超过元组数量时,情况会变得更糟,在这种情况下,由于偶然因素报告实际不相关模式的可能性会相当高。正是由于上述原因,基因分组和选择是许多数据挖掘算法应用于基因表达数据时有效运行的重要预处理步骤。本文定义了属性聚类问题,并介绍了一种解决该问题的方法。我们提出的方法通过优化一个从反映属性间相互依赖性的信息度量导出的准则函数,将相互依赖的属性聚为簇。通过将我们的算法应用于基因表达数据,发现了有意义的基因簇。基于组内属性相互依赖性对基因进行分组有助于捕捉每组中基因关联模式的不同方面。然后从每组中选择的显著基因包含用于基因表达分类和识别的有用信息。为了评估所提方法的性能,我们将其应用于两个著名的基因表达数据集,并将我们的结果与其他方法获得的结果进行比较。我们的实验表明,所提方法能够找到有意义的基因簇。通过选择在簇内与其他基因具有高度多重依赖性的基因子集,可以获得显著的分类信息。因此,一小部分选定的基因可用于构建具有非常高分类率的分类器。从该子集中,可以识别不同类别的基因表达。

相似文献

1
Attribute clustering for grouping, selection, and classification of gene expression data.用于基因表达数据分组、选择和分类的属性聚类
IEEE/ACM Trans Comput Biol Bioinform. 2005 Apr-Jun;2(2):83-101. doi: 10.1109/TCBB.2005.17.
2
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。
BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.
3
Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。
Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.
4
An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data.一种用于在嘈杂基因表达数据中挖掘重叠共表达模式的迭代数据挖掘方法。
IEEE Trans Nanobioscience. 2009 Sep;8(3):252-8. doi: 10.1109/TNB.2009.2026747. Epub 2009 Jul 14.
5
Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类:性能与相似性分析
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.
6
Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.用于基因分组的分裂相关聚类算法(DCCA):检测表达谱中的变化模式。
Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.
7
Feature construction from synergic pairs to improve microarray-based classification.基于协同对的特征构建以改进基于微阵列的分类
Bioinformatics. 2007 Nov 1;23(21):2866-72. doi: 10.1093/bioinformatics/btm429. Epub 2007 Oct 9.
8
Mining co-regulated gene profiles for the detection of functional associations in gene expression data.挖掘共调控基因谱以检测基因表达数据中的功能关联。
Bioinformatics. 2007 Aug 1;23(15):1927-35. doi: 10.1093/bioinformatics/btm276. Epub 2007 May 30.
9
An improved algorithm for clustering gene expression data.一种用于聚类基因表达数据的改进算法。
Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.
10
Associative clustering for exploring dependencies between functional genomics data sets.用于探索功能基因组学数据集之间依赖性的关联聚类
IEEE/ACM Trans Comput Biol Bioinform. 2005 Jul-Sep;2(3):203-16. doi: 10.1109/TCBB.2005.32.

引用本文的文献

1
Sensing the squeeze: nuclear mechanotransduction in health and disease.感知压力:核机械转导在健康和疾病中的作用。
Nucleus. 2024 Dec;15(1):2374854. doi: 10.1080/19491034.2024.2374854. Epub 2024 Jul 1.
2
Review of feature selection approaches based on grouping of features.基于特征分组的特征选择方法综述。
PeerJ. 2023 Jul 17;11:e15666. doi: 10.7717/peerj.15666. eCollection 2023.
3
The ability to classify patients based on gene-expression data varies by algorithm and performance metric.基于基因表达数据对患者进行分类的能力因算法和性能指标而异。
PLoS Comput Biol. 2022 Mar 11;18(3):e1009926. doi: 10.1371/journal.pcbi.1009926. eCollection 2022 Mar.
4
An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples.一种基于多重过滤和监督属性聚类算法的集成机器学习模型,用于对癌症样本进行分类。
PeerJ Comput Sci. 2021 Sep 16;7:e671. doi: 10.7717/peerj-cs.671. eCollection 2021.
5
Gene selection for cancer classification with the help of bees.借助蜜蜂进行癌症分类的基因选择
BMC Med Genomics. 2016 Aug 10;9 Suppl 2(Suppl 2):47. doi: 10.1186/s12920-016-0204-7.
6
A classification framework applied to cancer gene expression profiles.一种应用于癌症基因表达谱的分类框架。
J Healthc Eng. 2013;4(2):255-83. doi: 10.1260/2040-2295.4.2.255.
7
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.亚分子层次蛋白质结构中位点相互依赖性的统计发现
EURASIP J Bioinform Syst Biol. 2012 Jul 13;2012(1):8. doi: 10.1186/1687-4153-2012-8.
8
Unsupervised fuzzy pattern discovery in gene expression data.基于基因表达数据的无监督模糊模式发现。
BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S5. doi: 10.1186/1471-2105-12-S5-S5. Epub 2011 Jul 27.