Nye Benjamin, Jessy Li Junyi, Patel Roma, Yang Yinfei, Marshall Iain J, Nenkova Ani, Wallace Byron C
Northeastern University,
UT Austin,
Proc Conf Assoc Comput Linguist Meet. 2018 Jul;2018:197-207.
We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the 'PICO' elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.
我们展示了一个包含5000篇医学文章丰富注释摘要的语料库,这些文章描述了临床随机对照试验。注释包括对文本跨度的划分,这些跨度描述了所纳入的患者群体、所研究的干预措施及其对照物,以及所测量的结果(“PICO”要素)。这些跨度在更细粒度的层面上进一步注释,例如,其中的各个干预措施被标记并映射到结构化医学词汇表上。我们从具有不同专业水平和成本的多样化工作者群体中获取注释。我们详细描述了我们的数据收集过程和语料库本身。然后,我们概述了一系列具有挑战性的自然语言处理任务,这些任务将有助于医学文献检索和循证医学实践。