Suppr超能文献

用于预测伴侣动物死亡风险的可解释文本-表格模型。

Explainable text-tabular models for predicting mortality risk in companion animals.

机构信息

Department of Computer Science, Durham University, Durham, UK.

Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK.

出版信息

Sci Rep. 2024 Jun 20;14(1):14217. doi: 10.1038/s41598-024-64551-1.

Abstract

As interest in using machine learning models to support clinical decision-making increases, explainability is an unequivocal priority for clinicians, researchers and regulators to comprehend and trust their results. With many clinical datasets containing a range of modalities, from the free-text of clinician notes to structured tabular data entries, there is a need for frameworks capable of providing comprehensive explanation values across diverse modalities. Here, we present a multimodal masking framework to extend the reach of SHapley Additive exPlanations (SHAP) to text and tabular datasets to identify risk factors for companion animal mortality in first-opinion veterinary electronic health records (EHRs) from across the United Kingdom. The framework is designed to treat each modality consistently, ensuring uniform and consistent treatment of features and thereby fostering predictability in unimodal and multimodal contexts. We present five multimodality approaches, with the best-performing method utilising PetBERT, a language model pre-trained on a veterinary dataset. Utilising our framework, we shed light for the first time on the reasons each model makes its decision and identify the inclination of PetBERT towards a more pronounced engagement with free-text narratives compared to BERT-base's predominant emphasis on tabular data. The investigation also explores the important features on a more granular level, identifying distinct words and phrases that substantially influenced an animal's life status prediction. PetBERT showcased a heightened ability to grasp phrases associated with veterinary clinical nomenclature, signalling the productivity of additional pre-training of language models.

摘要

随着人们对使用机器学习模型来支持临床决策的兴趣日益增加,可解释性成为临床医生、研究人员和监管机构理解和信任其结果的明确优先事项。许多临床数据集包含多种模态,从临床医生笔记的自由文本到结构化的表格数据条目,因此需要能够在各种模态中提供全面解释值的框架。在这里,我们提出了一种多模态掩蔽框架,将 SHapley Additive exPlanations (SHAP) 扩展到文本和表格数据集,以识别来自英国各地的第一意见兽医电子健康记录 (EHR) 中伴侣动物死亡的风险因素。该框架旨在一致地对待每种模态,确保对特征进行统一和一致的处理,从而在单模态和多模态环境中促进可预测性。我们提出了五种多模态方法,其中表现最好的方法利用了 PetBERT,这是一种在兽医数据集上预训练的语言模型。利用我们的框架,我们首次揭示了每个模型做出决策的原因,并确定了 PetBERT 相对于 BERT-base 更倾向于更深入地参与自由文本叙述的倾向,而 BERT-base 主要强调表格数据。该调查还在更细粒度的层面上探索了重要特征,确定了对动物生命状态预测有重大影响的独特单词和短语。PetBERT 展示了捕捉与兽医临床命名法相关的短语的更高能力,这表明对语言模型进行额外预训练的效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a81/11190214/7a2ebcc478f6/41598_2024_64551_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验