Joseph Lionel P, Joseph Erica A, Prasad Ramendra
School of Mathematics, Physics, and Computing, University of Southern Queensland, Springfield, QLD, 4300, Australia.
Umanand Prasad School of Medicine and Health Sciences, The University of Fiji, Saweni, Lautoka, Fiji.
Comput Biol Med. 2022 Dec;151(Pt A):106178. doi: 10.1016/j.compbiomed.2022.106178. Epub 2022 Oct 6.
Diabetes is a deadly chronic disease that occurs when the pancreas is not able to produce ample insulin or when the body cannot use insulin effectively. If undetected, it may lead to a host of health complications. Hence, accurate and explainable early-stage detection of diabetes is essential for the proper administration of treatment options in leading a healthy and productive life. For this, we developed an interpretable TabNet model tuned via Bayesian optimization (BO). To achieve model-specific interpretability, the attention mechanism of TabNet architecture was used, which offered the local and global model explanations on the influence of the attributes on the outcomes. The model was further explained locally and globally using more robust model-agnostic LIME and SHAP eXplainable Artificial Intelligence (XAI) tools. The proposed model outperformed all benchmarked models by obtaining high accuracy of 92.2% and 99.4% using the Pima Indians diabetes dataset (PIDD) and the early-stage diabetes risk prediction dataset (ESDRPD), respectively. Based on the XAI results, it was clear that the most influential attribute for diabetes classification using PIDD and ESDRPD were Insulin and Polyuria, respectively. The feature importance values registered for insulin was 0.301 (PIDD) and for polyuria 0.206 was registered (ESDRPD). The high accuracy and ancillary interpretability of our objective model is expected to increase end-users trust and confidence in early-stage detection of diabetes.
糖尿病是一种致命的慢性疾病,当胰腺无法分泌足够的胰岛素,或者身体无法有效利用胰岛素时就会发生。如果未被发现,它可能会导致一系列健康并发症。因此,准确且可解释的糖尿病早期检测对于在健康且有意义的生活中正确实施治疗方案至关重要。为此,我们开发了一种通过贝叶斯优化(BO)进行调优的可解释TabNet模型。为了实现特定于模型的可解释性,使用了TabNet架构的注意力机制,该机制提供了关于属性对结果影响的局部和全局模型解释。使用更强大的与模型无关的LIME和SHAP可解释人工智能(XAI)工具对该模型进行了进一步的局部和全局解释。所提出的模型分别使用皮马印第安人糖尿病数据集(PIDD)和早期糖尿病风险预测数据集(ESDRPD),以92.2%和99.4%的高精度优于所有基准模型。根据XAI结果,很明显,使用PIDD和ESDRPD进行糖尿病分类时最具影响力的属性分别是胰岛素和多尿。胰岛素的特征重要性值在PIDD中为0.301,在ESDRPD中多尿的特征重要性值为0.206。我们目标模型的高精度和辅助可解释性有望提高终端用户对糖尿病早期检测的信任和信心。