Suppr超能文献

利用临床记录进行患者表示学习和可解释评估。

Patient representation learning and interpretable evaluation using clinical notes.

机构信息

Antwerp University Hospital, ICT Department, Wilrijkstraat 10, Edegem 2650, Belgium; Computational Linguistics and Psycholinguistics (CLiPS) Research Center, University of Antwerp, Prinsstraat 13, Antwerp 2000, Belgium.

Computational Linguistics and Psycholinguistics (CLiPS) Research Center, University of Antwerp, Prinsstraat 13, Antwerp 2000, Belgium.

出版信息

J Biomed Inform. 2018 Aug;84:103-113. doi: 10.1016/j.jbi.2018.06.016. Epub 2018 Jul 3.

Abstract

We have three contributions in this work: 1. We explore the utility of a stacked denoising autoencoder and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. To analyze if these representations are transferable across tasks, we evaluate them in multiple supervised setups to predict patient mortality, primary diagnostic and procedural category, and gender. We compare their performance with sparse representations obtained from a bag-of-words model. We observe that the learned generalized representations significantly outperform the sparse representations when we have few positive instances to learn from, and there is an absence of strong lexical features. 2. We compare the model performance of the feature set constructed from a bag of words to that obtained from medical concepts. In the latter case, concepts represent problems, treatments, and tests. We find that concept identification does not improve the classification performance. 3. We propose novel techniques to facilitate model interpretability. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate feature sensitivity across two networks to identify the most significant input features for different classification tasks when we use these pretrained representations as the supervised input. We successfully extract the most influential features for the pipeline using this technique.

摘要

我们在这项工作中有三个贡献

  1. 我们探索了堆叠去噪自动编码器和段落向量模型的实用性,以直接从临床笔记中学习与任务无关的密集患者表示。为了分析这些表示是否可以跨任务转移,我们在多个监督设置中评估它们,以预测患者死亡率、主要诊断和程序类别以及性别。我们将它们的性能与从词袋模型获得的稀疏表示进行比较。我们观察到,当我们从很少的正例学习并且没有强烈的词汇特征时,学习到的广义表示明显优于稀疏表示。2. 我们比较了从单词袋构造的特征集与从医学概念获得的模型性能。在后一种情况下,概念代表问题、治疗和测试。我们发现概念识别不能提高分类性能。3. 我们提出了新的技术来促进模型可解释性。为了理解和解释表示,我们探索了从自动编码器模型获得的患者表示中最佳编码的特征。此外,我们在两个网络中计算特征敏感性,以在使用这些预训练表示作为监督输入时,为不同的分类任务识别最相关的输入特征。我们成功地使用该技术提取了该流水线的最具影响力的特征。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验