Mishra Rashmi, Bian Jiantao, Fiszman Marcelo, Weir Charlene R, Jonnalagadda Siddhartha, Mostafa Javed, Del Fiol Guilherme
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Clinical Modeling Team, Intermountain Healthcare, Salt Lake City, UT, USA.
J Biomed Inform. 2014 Dec;52:457-67. doi: 10.1016/j.jbi.2014.06.009. Epub 2014 Jul 10.
The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain.
MEDLINE (2000 to October 2013), IEEE Digital Library, and the ACM digital library were searched. Investigators independently screened and abstracted studies that examined text summarization techniques in the biomedical domain. Information is derived from selected articles on five dimensions: input, purpose, output, method and evaluation.
Of 10,786 studies retrieved, 34 (0.3%) met the inclusion criteria. Natural language processing (17; 50%) and a hybrid technique comprising of statistical, Natural language processing and machine learning (15; 44%) were the most common summarization approaches. Most studies (28; 82%) conducted an intrinsic evaluation.
This is the first systematic review of text summarization in the biomedical domain. The study identified research gaps and provides recommendations for guiding future research on biomedical text summarization.
Recent research has focused on a hybrid technique comprising statistical, language processing and machine learning techniques. Further research is needed on the application and evaluation of text summarization in real research or patient care settings.
临床医生和临床研究人员可获取的信息量正呈指数级增长。文本摘要通过减少信息量,试图让用户能更快速、轻松地找到并理解相关源文本。近年来,已开展了大量研究来开发和评估生物医学领域的各种摘要技术。本研究的目的是系统回顾生物医学领域中有关文本文献摘要的近期已发表研究。
检索了MEDLINE(2000年至2013年10月)、IEEE数字图书馆和ACM数字图书馆。研究人员独立筛选并提取了研究生物医学领域文本摘要技术的研究。信息从选定文章的五个维度得出:输入、目的、输出、方法和评估。
在检索到的10786项研究中,34项(0.3%)符合纳入标准。自然语言处理(17项;50%)以及由统计、自然语言处理和机器学习组成的混合技术(15项;44%)是最常见的摘要方法。大多数研究(28项;82%)进行了内在评估。
这是生物医学领域文本摘要的首次系统综述。该研究确定了研究差距,并为指导生物医学文本摘要的未来研究提供了建议。
近期研究集中在一种包含统计、语言处理和机器学习技术的混合技术上。在实际研究或患者护理环境中,文本摘要的应用和评估还需要进一步研究。