Hassanzadeh Hamed, Groza Tudor, Hunter Jane
School of ITEE, The University of Queensland, Australia.
J Biomed Inform. 2014 Jun;49:159-70. doi: 10.1016/j.jbi.2014.02.006. Epub 2014 Feb 14.
Evidence Based Medicine (EBM) provides a framework that makes use of the current best evidence in the domain to support clinicians in the decision making process. In most cases, the underlying foundational knowledge is captured in scientific publications that detail specific clinical studies or randomised controlled trials. Over the course of the last two decades, research has been performed on modelling key aspects described within publications (e.g., aims, methods, results), to enable the successful realisation of the goals of EBM. A significant outcome of this research has been the PICO (Population/Problem-Intervention-Comparison-Outcome) structure, and its refined version PIBOSO (Population-Intervention-Background-Outcome-Study Design-Other), both of which provide a formalisation of these scientific artefacts. Subsequently, using these schemes, diverse automatic extraction techniques have been proposed to streamline the knowledge discovery and exploration process in EBM. In this paper, we present a Machine Learning approach that aims to classify sentences according to the PIBOSO scheme. We use a discriminative set of features that do not rely on any external resources to achieve results comparable to the state of the art. A corpus of 1000 structured and unstructured abstracts - i.e., the NICTA-PIBOSO corpus - is used for training and testing. Our best CRF classifier achieves a micro-average F-score of 90.74% and 87.21%, respectively, over structured and unstructured abstracts, which represents an increase of 25.48 percentage points and 26.6 percentage points in F-score when compared to the best existing approaches.
循证医学(EBM)提供了一个框架,利用该领域当前的最佳证据来支持临床医生进行决策。在大多数情况下,基础基础知识体现在详细描述特定临床研究或随机对照试验的科学出版物中。在过去二十年中,已经开展了关于对出版物中描述的关键方面(如目的、方法、结果)进行建模的研究,以实现循证医学目标的成功达成。这项研究的一个重要成果是PICO(人群/问题 - 干预 - 对照 - 结果)结构及其改进版本PIBOSO(人群 - 干预 - 背景 - 结果 - 研究设计 - 其他),这两者都对这些科学工件进行了形式化。随后,使用这些方案,人们提出了各种自动提取技术,以简化循证医学中的知识发现和探索过程。在本文中,我们提出了一种机器学习方法,旨在根据PIBOSO方案对句子进行分类。我们使用一组有区分性的特征,这些特征不依赖任何外部资源,以获得与现有技术相当的结果。一个包含1000篇结构化和非结构化摘要的语料库——即NICTA - PIBOSO语料库——用于训练和测试。我们最好的CRF分类器在结构化和非结构化摘要上分别达到了90.74%和87.21%的微平均F分数,与现有最佳方法相比,F分数分别提高了25.48个百分点和26.6个百分点。