Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China.
IEEE Trans Biomed Eng. 2011 May;58(5):1246-52. doi: 10.1109/TBME.2010.2047724. Epub 2010 Apr 15.
Due to the complexity of the underlying biological processes, gene expression data obtained from DNA microarray technologies are typically noisy and have very high dimensionality and these make the mining of such data for gene function prediction very difficult. To tackle these difficulties, we propose to use an incremental fuzzy mining technique called incremental fuzzy mining (IFM). By transforming quantitative expression values into linguistic terms, such as highly or lowly expressed, IFM can effectively capture heterogeneity in expression data for pattern discovery. It does so using a fuzzy measure to determine if interesting association patterns exist between the linguistic gene expression levels. Based on these patterns, IFM can make accurate gene function predictions and these predictions can be made in such a way that each gene can be allowed to belong to more than one functional class with different degrees of membership. Gene function prediction problem can be formulated both as classification and clustering problems, and IFM can be used either as a classification technique or together with existing clustering algorithms to improve the cluster groupings discovered for greater prediction accuracies. IFM is characterized also by its being an incremental data mining technique so that the discovered patterns can be continually refined based only on newly collected data without the need for retraining using the whole dataset. For performance evaluation, IFM has been tested with real expression datasets for both classification and clustering tasks. Experimental results show that it can effectively uncover hidden patterns for accurate gene function predictions.
由于潜在生物过程的复杂性,从 DNA 微阵列技术获得的基因表达数据通常是嘈杂的,并且具有非常高的维度,这使得挖掘这些数据以进行基因功能预测非常困难。为了解决这些困难,我们提出使用一种称为增量模糊挖掘(IFM)的增量模糊挖掘技术。通过将定量表达值转换为语言术语,例如高度或低度表达,IFM 可以有效地捕获表达数据中的异质性以进行模式发现。它通过使用模糊测度来确定语言基因表达水平之间是否存在有趣的关联模式。基于这些模式,IFM 可以进行准确的基因功能预测,并且可以以允许每个基因以不同程度的隶属度属于多个功能类的方式进行预测。基因功能预测问题可以被表述为分类和聚类问题,并且 IFM 可以被用作分类技术或与现有的聚类算法一起使用,以提高发现的聚类分组,从而提高预测精度。IFM 的特点还在于它是一种增量数据挖掘技术,因此可以仅基于新收集的数据不断细化发现的模式,而无需使用整个数据集进行重新训练。为了进行性能评估,IFM 已经在分类和聚类任务中使用真实表达数据集进行了测试。实验结果表明,它可以有效地揭示隐藏模式,以进行准确的基因功能预测。