He Yulan, Hui Siu Cheung
Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes MK7 6AA, UK.
Artif Intell Med. 2009 Oct;47(2):105-19. doi: 10.1016/j.artmed.2009.03.004. Epub 2009 Apr 18.
Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification.
An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution.
Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy.
Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.
最近,许多研究提出使用受自然启发的算法来执行复杂的机器学习任务。蚁群优化(ACO)就是这样一种基于群体智能的算法,它源自一个受蚂蚁集体觅食行为启发的模型。利用蚁群优化算法在自组织和鲁棒性等方面的特性,本文研究基于蚂蚁的算法用于基因表达数据聚类和关联分类。
提出了基于蚂蚁的聚类算法(Ant-C)和基于蚂蚁的关联规则挖掘算法(Ant-ARM)用于基因表达数据分析。所提出的算法利用蚂蚁的自然行为,如合作和适应,以便灵活、稳健地搜索良好的候选解决方案。
Ant-C在从斯坦福基因组资源数据库中选择的三个数据集上进行了测试,与其他经典聚类方法相比,取得了相对较高的准确率。Ant-ARM在急性淋巴细胞白血病(ALL)/急性髓细胞白血病(AML)数据集上进行了测试,并生成了约30条高精度的分类规则。
Ant-C无需结合任何其他算法,如K均值或凝聚层次聚类,就能生成最优的聚类数量。对于关联分类,虽然一些知名算法,如Apriori、FP增长和Magnum Opus在合理时间内无法从ALL/AML数据集中挖掘出任何关联规则,但Ant-ARM能够提取关联分类规则。