Das Surajit, Sultana Mahamuda, Bhattacharya Suman, Sengupta Diganta, De Debashis
Department of Information Technology, Meghnad Saha Institute of Technology, Kolkata, 700150 India.
Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, West Bengal, Nadia, 741249 West Bengal India.
J Supercomput. 2023 May 12:1-31. doi: 10.1007/s11227-023-05356-3.
Machine learning (ML) has been used for classification of heart diseases for almost a decade, although understanding of the internal working of the black boxes, i.e., non-interpretable models, remain a demanding problem. Another major challenge in such ML models is the curse of dimensionality leading to resource intensive classification using the comprehensive set of feature vector (CFV). This study focuses on dimensionality reduction using explainable artificial intelligence, without negotiating on accuracy for heart disease classification. Four explainable ML models, using SHAP, were used for classification which reflected the feature contributions (FC) and feature weights (FW) for each feature in the CFV for generating the final results. FC and FW were taken into account in generating the reduced dimensional feature subset (FS). The findings of the study are as follows: (a) XGBoost classifies heart diseases best with explanations, with an increase in 2% in model accuracy over existing best proposals, (b) explainable classification using FS exhibits better accuracy than most of the literary proposals, and (c) with the increase in explainability, accuracy can be preserved using XGBoost classifier for classifying heart diseases, and (d) the top four features responsible for diagnosis of heart disease have been exhibited which have common occurrences in all the explanations reflected by the five explainable techniques used on XGBoost classifier based on feature contributions. To the best of our knowledge, this is first attempt to explain XGBoost classification for diagnosis of heart diseases using five explainable techniques.
近十年来,机器学习(ML)一直被用于心脏病的分类,尽管理解黑箱(即不可解释模型)的内部工作原理仍然是一个棘手的问题。此类ML模型的另一个主要挑战是维度诅咒,这导致使用综合特征向量集(CFV)进行资源密集型分类。本研究专注于使用可解释人工智能进行降维,同时不降低心脏病分类的准确性。使用SHAP的四个可解释ML模型用于分类,这些模型反映了CFV中每个特征的特征贡献(FC)和特征权重(FW),以生成最终结果。在生成降维特征子集(FS)时考虑了FC和FW。该研究的结果如下:(a)XGBoost在具有解释性的情况下对心脏病的分类效果最佳,模型准确率比现有最佳提议提高了2%;(b)使用FS的可解释分类比大多数文献提议具有更高的准确率;(c)随着可解释性的提高,使用XGBoost分类器对心脏病进行分类时可以保持准确率;(d)展示了导致心脏病诊断的前四个特征,这些特征在基于特征贡献的XGBoost分类器上使用的五种可解释技术所反映的所有解释中都有共同出现。据我们所知