Zhou Pei-Yuan, Takeuchi Amane, Martinez-Lopez Fernando, Ehghaghi Malikeh, Wong Andrew K C, Lee En-Shiun Annie
System Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
Department of Computer Science, University of Toronto, Toronto, ON M5S 1A1, Canada.
Bioengineering (Basel). 2025 Mar 18;12(3):308. doi: 10.3390/bioengineering12030308.
The healthcare industry seeks to integrate AI into clinical applications, yet understanding AI decision making remains a challenge for healthcare practitioners as these systems often function as black boxes. Our work benchmarks the Pattern Discovery and Disentanglement (PDD) system's unsupervised learning algorithm, which provides interpretable outputs and clustering results from clinical notes to aid decision making. Using the MIMIC-IV dataset, we process free-text clinical notes and ICD-9 codes with Term Frequency-Inverse Document Frequency and Topic Modeling. The PDD algorithm discretizes numerical features into event-based features, discovers association patterns from a disentangled statistical feature value association space, and clusters clinical records. The output is an interpretable knowledge base linking knowledge, patterns, and data to support decision making. Despite being unsupervised, PDD demonstrated performance comparable to supervised deep learning models, validating its clustering ability and knowledge representation. We benchmark interpretability techniques-Feature Permutation, Gradient SHAP, and Integrated Gradients-on the best-performing models (in terms of F1, ROC AUC, balanced accuracy, etc.), evaluating these based on sufficiency, comprehensiveness, and sensitivity metrics. Our findings highlight the limitations of feature importance ranking and post hoc analysis for clinical diagnosis. Meanwhile, PDD's global interpretability effectively compensates for these issues, helping healthcare practitioners understand the decision-making process and providing suggestive clusters of diseases to assist their diagnosis.
医疗行业试图将人工智能整合到临床应用中,但对于医疗从业者来说,理解人工智能的决策过程仍然是一项挑战,因为这些系统通常像黑匣子一样运作。我们的工作对模式发现与解缠结(PDD)系统的无监督学习算法进行了基准测试,该算法可从临床记录中提供可解释的输出和聚类结果,以辅助决策。使用MIMIC-IV数据集,我们通过词频-逆文档频率和主题建模来处理自由文本临床记录和ICD-9编码。PDD算法将数值特征离散化为基于事件的特征,从解缠结的统计特征值关联空间中发现关联模式,并对临床记录进行聚类。输出结果是一个可解释的知识库,它将知识、模式和数据联系起来以支持决策。尽管PDD是无监督的,但其表现与有监督的深度学习模型相当,验证了其聚类能力和知识表示。我们在表现最佳的模型(根据F1、ROC AUC、平衡准确率等)上对可解释性技术——特征排列、梯度SHAP和集成梯度——进行基准测试,并根据充分性、全面性和敏感性指标对其进行评估。我们的研究结果突出了临床诊断中特征重要性排序和事后分析的局限性。同时,PDD的全局可解释性有效地弥补了这些问题,帮助医疗从业者理解决策过程,并提供疾病的提示性聚类以协助他们进行诊断。