Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States of America.
Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, United States of America.
PLoS One. 2020 May 12;15(5):e0232840. doi: 10.1371/journal.pone.0232840. eCollection 2020.
Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence-for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks-site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.
个人电子健康记录 (EHR) 和临床报告通常是更大序列的一部分——例如,单个患者在疾病发展过程中可能会生成多个报告。在癌症病理报告等应用中,不仅需要从各个报告中提取信息,还需要根据序列中所有报告的病例级别上下文,捕获关于整个癌症病例的汇总信息。在本文中,我们引入了一种简单的模块式附加组件,用于捕获病例级别上下文,旨在与大多数现有的用于个体报告文本分类的深度学习架构兼容。我们在一个包含 431,433 份癌症病理报告的语料库上测试了我们的方法,结果表明,在六个分类任务(部位、亚部位、侧别、组织学、行为和分级)中,结合病例级别上下文显著提高了分类准确性。我们预计,经过最小的修改,我们的附加组件可以应用于广泛的其他基于临床文本的任务。