Sarker Abeed, Yang Yuan-Chi, Al-Garadi Mohammed Ali, Abbas Aamir
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States.
Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, United States.
Front Digit Health. 2020 Dec 4;2:585559. doi: 10.3389/fdgth.2020.585559. eCollection 2020.
As the volume of published medical research continues to grow rapidly, staying up-to-date with the best-available research evidence regarding specific topics is becoming an increasingly challenging problem for medical experts and researchers. The current COVID19 pandemic is a good example of a topic on which research evidence is rapidly evolving. Automatic query-focused text summarization approaches may help researchers to swiftly review research evidence by presenting salient and query-relevant information from newly-published articles in a condensed manner. Typical medical text summarization approaches require domain knowledge, and the performances of such systems rely on resource-heavy medical domain-specific knowledge sources and pre-processing methods (e.g., text classification) for deriving semantic information. Consequently, these systems are often difficult to speedily customize, extend, or deploy in low-resource settings, and they are often operationally slow. In this paper, we propose a fast and simple extractive summarization approach that can be easily deployed and run, and may thus aid medical experts and researchers obtain fast access to the latest research evidence. At runtime, our system utilizes similarity measurements derived from pre-trained medical domain-specific word embeddings in addition to simple features, rather than computationally-expensive pre-processing and resource-heavy knowledge bases. Automatic evaluation using ROUGE-a summary evaluation tool-on a public dataset for evidence-based medicine shows that our system's performance, despite the simple implementation, is statistically comparable with the state-of-the-art. Extrinsic manual evaluation based on recently-released COVID19 articles demonstrates that the summarizer performance is close to human agreement, which is generally low, for extractive summarization.
随着已发表的医学研究数量持续快速增长,对于医学专家和研究人员而言,及时了解特定主题的最佳现有研究证据正成为一个日益具有挑战性的问题。当前的新冠疫情就是一个研究证据迅速演变的主题的典型例子。自动的基于查询的文本摘要方法可能有助于研究人员通过以浓缩的方式呈现新发表文章中突出且与查询相关的信息,迅速回顾研究证据。典型的医学文本摘要方法需要领域知识,并且此类系统的性能依赖于资源密集型的医学领域特定知识源和用于推导语义信息的预处理方法(例如文本分类)。因此,这些系统通常难以在低资源环境中快速定制、扩展或部署,并且它们的运行速度往往较慢。在本文中,我们提出了一种快速且简单的提取式摘要方法,该方法可以轻松部署和运行,从而可能帮助医学专家和研究人员快速获取最新的研究证据。在运行时,我们的系统除了使用简单特征外,还利用从预训练的医学领域特定词嵌入中得出的相似性度量,而不是计算成本高昂的预处理和资源密集型知识库。使用ROUGE(一种摘要评估工具)在一个基于证据的医学公共数据集上进行自动评估表明,尽管我们的系统实现简单,但其性能在统计上与最先进的系统相当。基于最近发布的新冠疫情文章进行的外部人工评估表明,对于提取式摘要,该摘要器的性能接近人类的一致性,而这种一致性通常较低。