Li Jinyan, Liu Huiqing, Ng See-Kiong, Wong Limsoon
Institute for Infocomm Research, Heng Mui Keng Terrace, Singapore.
Bioinformatics. 2003 Oct;19 Suppl 2:ii93-102. doi: 10.1093/bioinformatics/btg1066.
We introduce a new method to discover many diversified and significant rules from high dimensional profiling data. We also propose to aggregate the discriminating power of these rules for reliable predictions. The discovered rules are found to contain low-ranked features; these features are found to be sometimes necessary for classifiers to achieve perfect accuracy. The use of low-ranked but essential features in our method is in contrast to the prevailing use of an ad-hoc number of only top-ranked features. On a wide range of data sets, our method displayed highly competitive accuracy compared to the best performance of other kinds of classification models. In addition to accuracy, our method also provides comprehensible rules to help elucidate the translation between raw data and useful knowledge.
我们介绍了一种从高维剖析数据中发现许多多样化且重要规则的新方法。我们还提议汇总这些规则的判别力以进行可靠预测。发现的规则包含排名靠后的特征;这些特征有时被发现是分类器实现完美准确率所必需的。我们方法中使用排名靠后的但必不可少的特征,这与仅使用数量随意的顶级排名特征的普遍做法形成对比。在广泛的数据集上,与其他类型分类模型的最佳性能相比,我们的方法展现出极具竞争力的准确率。除了准确率,我们的方法还提供可理解的规则,以帮助阐明原始数据与有用知识之间的转化。