Hirano S, Tsumoto S, Okuzaki T, Hata Y
Department of Medical Informatics, Shimane Medical University, School of Medicine, Izumo, Shimane 691-8501, Japan.
Stud Health Technol Inform. 2001;84(Pt 1):206-10.
This paper proposes a clustering method for nominal and numerical data based on Rough Sets and its application to knowledge discovery in the medical database. Classification is performed according to the indiscernibility relations defined on the basis of relative similarity between objects. The similarity is defined as a combination of two types of similarity measures: the Hamming distance for nominal attributes and the Mahalanobis distance for numerical attributes. Excessive generation of small category is suppressed by modifying similar equivalence relations into the same equivalence relation. An analysis of the meningoencephalitis diagnosis database was performed to validate this method. The result showed that this method could deal well with both types of attributes and discover the primary factors for diagnosis.
本文提出了一种基于粗糙集的名义数据和数值数据聚类方法及其在医学数据库知识发现中的应用。根据基于对象间相对相似性定义的不可分辨关系进行分类。相似性被定义为两种相似性度量的组合:名义属性的汉明距离和数值属性的马氏距离。通过将相似等价关系修改为相同等价关系来抑制小类的过度生成。对脑膜脑炎诊断数据库进行了分析以验证该方法。结果表明,该方法能够很好地处理这两种类型的属性并发现诊断的主要因素。