Filannino Michele, Uzuner Özlem
George Mason University, Fairfax, VA, USA.
Massachusetts Institute of Technology, Cambridge, MA, USA.
Yearb Med Inform. 2018 Aug;27(1):184-192. doi: 10.1055/s-0038-1667079. Epub 2018 Aug 29.
To review the latest scientific challenges organized in clinical Natural Language Processing (NLP) by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies.
We harvested the literature by using Google Scholar and PubMed Central to retrieve all shared tasks organized since 2015 on clinical NLP problems on English data.
We surveyed 17 shared tasks. We grouped the data into four types (synthetic, drug labels, social data, and clinical data) which are correlated with size and sensitivity. We found named entity recognition and classification to be the most common tasks. Most of the methods used to tackle the shared tasks have been data-driven. There is homogeneity in the methods used to tackle the named entity recognition tasks, while more diverse solutions are investigated for relation extraction, multi-class classification, and information retrieval problems.
There is a clear trend in using data-driven methods to tackle problems in clinical NLP. The availability of more and varied data from different institutions will undoubtedly lead to bigger advances in the field, for the benefit of healthcare as a whole.
通过突出任务、使用的最有效方法、数据和共享策略,回顾临床自然语言处理(NLP)中最新的科学挑战。
我们通过使用谷歌学术和PubMed Central收集文献,以检索自2015年以来针对英语数据上的临床NLP问题组织的所有共享任务。
我们调查了17个共享任务。我们将数据分为四种类型(合成数据、药品标签、社会数据和临床数据),这些类型与规模和敏感性相关。我们发现命名实体识别和分类是最常见的任务。用于解决共享任务的大多数方法都是数据驱动的。用于解决命名实体识别任务的方法具有同质性,而针对关系提取、多类分类和信息检索问题则研究了更多样化的解决方案。
使用数据驱动方法解决临床NLP问题存在明显趋势。来自不同机构的更多样化数据的可用性无疑将推动该领域取得更大进展,造福于整个医疗保健行业。