Decision Sciences Research Group, Manchester Business School East - F25, The University of Manchester, Manchester M15 9EP, United Kingdom.
Artif Intell Med. 2010 Oct;50(2):117-26. doi: 10.1016/j.artmed.2010.05.007. Epub 2010 Jun 20.
This paper reviews a methodology for evolving fuzzy classification which allows data to be processed in online mode by recursively modifying a fuzzy rule base on a per-sample basis from data streams. In addition, it shows how this methodology can be improved and applied to the field of diagnostics, for two popular medical problems.
The vast majority of existing methodologies for fuzzy medical diagnostics require the data records to be processed in offline mode, as a batch. Unfortunately this allows only a snapshot of the actual domain to be analysed. Should new data records become available they require cost sensitive calculations due to the fact that re-learning is an iterative procedure. eClass is a relatively new architecture for evolving fuzzy rule-based systems, which overcomes these problems. However, it is data order dependent as different orders of the data result into different rule bases. Nonetheless, it is shown that models of eClass can be improved by arranging the order of the incoming data using a simple optimization strategy.
In regards to the Pima Indians diabetes dataset, an accuracy of 79.37% was obtained, which is 0.84% lower than the highest in the literature. The proposed optimization strategy increased the accuracy and specificity of the model by 4.05% and 7.63% respectively. For the dermatology dataset, an accuracy of 97.55% was obtained, which is 1.65% lower than the highest in the literature. In this case, the proposed optimization strategy improved the accuracy of the model by 4.82%. The improved algorithm has been compared to other existing algorithms and seems to outperform the majority.
This paper has shown that eClass can effectively be applied to the classification of diabetes and dermatological diseases from discrete numerical samples. The results of using a novel optimization strategy indicate that the accuracy of eClass models can be further improved. Finally, the system can mine human readable rules which could enable medical experts to gain better understanding of a sample under analysis throughout the traditional diagnostic process.
本文回顾了一种进化模糊分类方法,该方法允许通过从数据流中逐样本递归修改模糊规则库,以在线方式处理数据。此外,它还展示了如何改进这种方法,并将其应用于诊断领域的两个常见医学问题。
绝大多数现有的用于模糊医学诊断的方法都要求以批处理的方式在线处理数据记录。不幸的是,这只允许分析实际领域的一个快照。如果有新的数据记录可用,由于重新学习是一个迭代过程,因此需要成本敏感的计算。eClass 是一种用于进化模糊规则系统的相对较新的架构,它克服了这些问题。然而,它是数据顺序相关的,因为数据的不同顺序会导致不同的规则库。尽管如此,通过使用简单的优化策略来安排输入数据的顺序,可以改进 eClass 的模型。
在皮马印第安人糖尿病数据集方面,获得了 79.37%的准确率,比文献中的最高值低 0.84%。所提出的优化策略分别将模型的准确性和特异性提高了 4.05%和 7.63%。在皮肤病数据集方面,获得了 97.55%的准确率,比文献中的最高值低 1.65%。在这种情况下,所提出的优化策略提高了模型的准确性 4.82%。改进后的算法与其他现有算法进行了比较,似乎优于大多数算法。
本文表明,eClass 可以有效地应用于糖尿病和皮肤病的离散数值样本分类。使用新颖的优化策略的结果表明,可以进一步提高 eClass 模型的准确性。最后,该系统可以挖掘出人类可读的规则,这可以使医学专家在传统的诊断过程中更好地理解正在分析的样本。