An Jiyuan, Chen Yi-Ping Phoebe
Faculty of Science and Technology, Deakin University, Melbourne, Victoria, Australia.
Comput Biol Chem. 2009 Feb;33(1):108-13. doi: 10.1016/j.compbiolchem.2008.07.031. Epub 2008 Aug 14.
Microarray data provides quantitative information about the transcription profile of cells. To analyze microarray datasets, methodology of machine learning has increasingly attracted bioinformatics researchers. Some approaches of machine learning are widely used to classify and mine biological datasets. However, many gene expression datasets are extremely high dimensionality, traditional machine learning methods cannot be applied effectively and efficiently. This paper proposes a robust algorithm to find out rule groups to classify gene expression datasets. Unlike the most classification algorithms, which select dimensions (genes) heuristically to form rules groups to identify classes such as cancerous and normal tissues, our algorithm guarantees finding out best-k dimensions (genes) to form rule groups for the classification of expression datasets. Our experiments show that the rule groups obtained by our algorithm have higher accuracy than that of other classification approaches.
微阵列数据提供了有关细胞转录谱的定量信息。为了分析微阵列数据集,机器学习方法越来越吸引生物信息学研究人员。一些机器学习方法被广泛用于对生物数据集进行分类和挖掘。然而,许多基因表达数据集具有极高的维度,传统的机器学习方法无法有效且高效地应用。本文提出了一种稳健的算法来找出规则组以对基因表达数据集进行分类。与大多数分类算法不同,后者通过启发式选择维度(基因)来形成规则组以识别诸如癌组织和正常组织等类别,我们的算法保证找出最佳的k个维度(基因)来形成用于表达数据集分类的规则组。我们的实验表明,通过我们的算法获得的规则组比其他分类方法具有更高的准确性。