Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1615-1618. doi: 10.1109/EMBC48229.2022.9871798.
While there has been recent progress in abstractive summarization as applied to different domains including news articles, scientific articles, and blog posts, the application of these techniques to clinical text summarization has been limited. This is primarily due to the lack of large-scale training data and the messy/unstructured nature of clinical notes as opposed to other domains where massive training data come in structured or semi -structured form. Further, one of the least explored and critical components of clinical text summarization is factual accuracy of clinical summaries. This is specifically crucial in the healthcare domain, cardiology in particular, where an accurate summary generation that preserves the facts in the source notes is critical to the well-being of a patient. In this study, we propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multi-objective optimization. We propose to jointly optimize three cost functions in our proposed architecture during training: generative loss, entity loss and knowledge loss and evaluate the proposed architecture on 1) clinical notes of patients with heart failure (HF), which we collect for this study; and 2) two benchmark datasets, Indiana University Chest X-ray collection (IU X-Ray), and MIMIC-CXR, that are publicly available. We experiment with three transformer encoder-decoder architectures and demonstrate that optimizing different loss functions leads to improved performance in terms of entity-level factual accuracy.
虽然在新闻文章、科学文章和博客文章等不同领域的抽象总结方面最近取得了进展,但这些技术在临床文本总结中的应用受到了限制。这主要是由于缺乏大规模的训练数据,以及与其他领域相比,临床笔记的混乱/非结构化性质,在其他领域,大量的训练数据采用结构化或半结构化形式。此外,临床文本总结中探索最少但至关重要的组件之一是临床总结的事实准确性。这在医疗保健领域,特别是心脏病学领域尤为重要,因为准确的摘要生成能够保留源笔记中的事实,这对患者的健康至关重要。在这项研究中,我们提出了一种使用知识引导的多目标优化来提高临床文本抽象总结的事实准确性的框架。我们建议在训练期间联合优化我们提出的架构中的三个成本函数:生成损失、实体损失和知识损失,并在以下方面评估所提出的架构:1)我们为此研究收集的心力衰竭 (HF) 患者的临床笔记;2)两个公开可用的基准数据集,即印第安纳大学 X 射线数据集 (IU X-Ray) 和 MIMIC-CXR。我们尝试了三种变压器编码器-解码器架构,并证明优化不同的损失函数可以提高实体级事实准确性方面的性能。