Suppr超能文献

利用临床记录进行患者表示学习和可解释评估。

Patient representation learning and interpretable evaluation using clinical notes.

机构信息

Antwerp University Hospital, ICT Department, Wilrijkstraat 10, Edegem 2650, Belgium; Computational Linguistics and Psycholinguistics (CLiPS) Research Center, University of Antwerp, Prinsstraat 13, Antwerp 2000, Belgium.

Computational Linguistics and Psycholinguistics (CLiPS) Research Center, University of Antwerp, Prinsstraat 13, Antwerp 2000, Belgium.

出版信息

J Biomed Inform. 2018 Aug;84:103-113. doi: 10.1016/j.jbi.2018.06.016. Epub 2018 Jul 3.

Abstract

We have three contributions in this work: 1. We explore the utility of a stacked denoising autoencoder and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. To analyze if these representations are transferable across tasks, we evaluate them in multiple supervised setups to predict patient mortality, primary diagnostic and procedural category, and gender. We compare their performance with sparse representations obtained from a bag-of-words model. We observe that the learned generalized representations significantly outperform the sparse representations when we have few positive instances to learn from, and there is an absence of strong lexical features. 2. We compare the model performance of the feature set constructed from a bag of words to that obtained from medical concepts. In the latter case, concepts represent problems, treatments, and tests. We find that concept identification does not improve the classification performance. 3. We propose novel techniques to facilitate model interpretability. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate feature sensitivity across two networks to identify the most significant input features for different classification tasks when we use these pretrained representations as the supervised input. We successfully extract the most influential features for the pipeline using this technique.

摘要

我们在这项工作中有三个贡献

  1. 我们探索了堆叠去噪自动编码器和段落向量模型的实用性,以直接从临床笔记中学习与任务无关的密集患者表示。为了分析这些表示是否可以跨任务转移,我们在多个监督设置中评估它们,以预测患者死亡率、主要诊断和程序类别以及性别。我们将它们的性能与从词袋模型获得的稀疏表示进行比较。我们观察到,当我们从很少的正例学习并且没有强烈的词汇特征时,学习到的广义表示明显优于稀疏表示。2. 我们比较了从单词袋构造的特征集与从医学概念获得的模型性能。在后一种情况下,概念代表问题、治疗和测试。我们发现概念识别不能提高分类性能。3. 我们提出了新的技术来促进模型可解释性。为了理解和解释表示,我们探索了从自动编码器模型获得的患者表示中最佳编码的特征。此外,我们在两个网络中计算特征敏感性,以在使用这些预训练表示作为监督输入时,为不同的分类任务识别最相关的输入特征。我们成功地使用该技术提取了该流水线的最具影响力的特征。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验