Cuzzola John, Jovanović Jelena, Bagheri Ebrahim
Laboratory for Systems, Software and Semantics (LS3), Ryerson University, Ontario, Canada(1).
Faculty of Organizational Sciences (FOS), University of Belgrade, Belgrade, Serbia(2).
J Biomed Inform. 2017 Jul;71:91-109. doi: 10.1016/j.jbi.2017.05.016. Epub 2017 May 26.
Recently, both researchers and practitioners have explored the possibility of semantically annotating large and continuously evolving collections of biomedical texts such as research papers, medical reports, and physician notes in order to enable their efficient and effective management and use in clinical practice or research laboratories. Such annotations can be automatically generated by biomedical semantic annotators - tools that are specifically designed for detecting and disambiguating biomedical concepts mentioned in text. The biomedical community has already presented several solid automated semantic annotators. However, the existing tools are either strong in their disambiguation capacity, i.e., the ability to identify the correct biomedical concept for a given piece of text among several candidate concepts, or they excel in their processing time, i.e., work very efficiently, but none of the semantic annotation tools reported in the literature has both of these qualities. In this paper, we present RysannMD (Ryerson Semantic Annotator for Medical Domain), a biomedical semantic annotation tool that strikes a balance between processing time and performance while disambiguating biomedical terms. In other words, RysannMD provides reasonable disambiguation performance when choosing the right sense for a biomedical term in a given context, and does that in a reasonable time. To examine how RysannMD stands with respect to the state of the art biomedical semantic annotators, we have conducted a series of experiments using standard benchmarking corpora, including both gold and silver standards, and four modern biomedical semantic annotators, namely cTAKES, MetaMap, NOBLE Coder, and Neji. The annotators were compared with respect to the quality of the produced annotations measured against gold and silver standards using precision, recall, and F measure and speed, i.e., processing time. In the experiments, RysannMD achieved the best median F measure across the benchmarking corpora, independent of the standard used (silver/gold), biomedical subdomain, and document size. In terms of the annotation speed, RysannMD scored the second best median processing time across all the experiments. The obtained results indicate that RysannMD offers the best performance among the examined semantic annotators when both quality of annotation and speed are considered simultaneously.
最近,研究人员和从业者都在探索对大量且不断发展的生物医学文本集合(如研究论文、医学报告和医生笔记)进行语义标注的可能性,以便在临床实践或研究实验室中对其进行高效管理和利用。此类标注可由生物医学语义标注工具自动生成,这些工具是专门设计用于检测和消除文本中提及的生物医学概念的歧义的。生物医学领域已经出现了几种可靠的自动语义标注工具。然而,现有工具要么在消除歧义能力方面表现出色,即能够在几个候选概念中为给定文本识别出正确的生物医学概念,要么在处理时间方面表现出色,即工作效率非常高,但文献中报道的语义标注工具都不具备这两种特性。在本文中,我们介绍了RysannMD(瑞尔森医学领域语义标注器),这是一种生物医学语义标注工具,在消除生物医学术语歧义的同时,在处理时间和性能之间取得了平衡。换句话说,RysannMD在给定上下文中为生物医学术语选择正确语义时提供了合理的消除歧义性能,并且能在合理的时间内完成。为了研究RysannMD相对于现有生物医学语义标注器的表现,我们使用标准基准语料库进行了一系列实验,包括金标准和银标准,以及四个现代生物医学语义标注器,即cTAKES、MetaMap、NOBLE Coder和Neji。这些标注器在根据金标准和银标准使用精确率、召回率和F值以及速度(即处理时间)来衡量所生成标注的质量方面进行了比较。在实验中,无论使用何种标准(银标准/金标准)、生物医学子领域和文档大小,RysannMD在基准语料库中均取得了最佳的中位数F值。在标注速度方面,RysannMD在所有实验中的中位数处理时间排名第二。所得结果表明,在同时考虑标注质量和速度时,RysannMD在所研究的语义标注器中表现最佳。