一个带有患者、干预措施和结果的多层次注释的语料库，以支持医学文献的语言处理。

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.

作者信息

Nye Benjamin, Jessy Li Junyi, Patel Roma, Yang Yinfei, Marshall Iain J, Nenkova Ani, Wallace Byron C

机构信息

Northeastern University,

UT Austin,

出版信息

Proc Conf Assoc Comput Linguist Meet. 2018 Jul;2018:197-207.

PMID:30305770

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6174533/

Abstract

We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the 'PICO' elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.

摘要

我们展示了一个包含5000篇医学文章丰富注释摘要的语料库，这些文章描述了临床随机对照试验。注释包括对文本跨度的划分，这些跨度描述了所纳入的患者群体、所研究的干预措施及其对照物，以及所测量的结果（“PICO”要素）。这些跨度在更细粒度的层面上进一步注释，例如，其中的各个干预措施被标记并映射到结构化医学词汇表上。我们从具有不同专业水平和成本的多样化工作者群体中获取注释。我们详细描述了我们的数据收集过程和语料库本身。然后，我们概述了一系列具有挑战性的自然语言处理任务，这些任务将有助于医学文献检索和循证医学实践。

相似文献

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.

Proc Conf Assoc Comput Linguist Meet. 2018 Jul;2018:197-207.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

An annotated corpus of clinical trial publications supporting schema-based relational information extraction.

J Biomed Semantics. 2022 May 23;13(1):14. doi: 10.1186/s13326-022-00271-7.

Concept annotation in the CRAFT corpus.

BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161.

SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.

J Biomed Semantics. 2022 May 8;13(1):13. doi: 10.1186/s13326-022-00269-1.

Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition.

J Am Med Inform Assoc. 2025 Mar 1;32(3):555-565. doi: 10.1093/jamia/ocae326.

NCBI disease corpus: a resource for disease name recognition and concept normalization.

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

Comparing generative and extractive approaches to information extraction from abstracts describing randomized clinical trials.

J Biomed Semantics. 2024 Apr 23;15(1):3. doi: 10.1186/s13326-024-00305-2.

Pretraining to Recognize PICO Elements from Randomized Controlled Trial Literature.

Stud Health Technol Inform. 2019 Aug 21;264:188-192. doi: 10.3233/SHTI190209.

Assessment of disease named entity recognition on a corpus of annotated sentences.

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-9-S3-S3.

引用本文的文献

Large Language Model Analysis of Reporting Quality of Randomized Clinical Trial Articles: A Systematic Review.

JAMA Netw Open. 2025 Aug 1;8(8):e2529418. doi: 10.1001/jamanetworkopen.2025.29418.

Evaluation of SURUS: a named entity recognition NLP system to extract knowledge from interventional study records.

BMC Med Res Methodol. 2025 Jul 31;25(1):184. doi: 10.1186/s12874-025-02624-z.

Constructing public health evidence knowledge graph for decision-making support from COVID-19 literature of modelling study.

J Saf Sci Resil. 2021 Sep;2(3):146-156. doi: 10.1016/j.jnlssr.2021.08.002. Epub 2021 Aug 13.

TrialSieve: A Comprehensive Biomedical Information Extraction Framework for PICO, Meta-Analysis, and Drug Repurposing.

Bioengineering (Basel). 2025 May 2;12(5):486. doi: 10.3390/bioengineering12050486.

Clinical insights: A comprehensive review of language models in medicine.

PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.

High-precision information retrieval for rapid clinical guideline updates.

NPJ Digit Med. 2025 Apr 27;8(1):227. doi: 10.1038/s41746-025-01648-5.

SPIRIT-CONSORT-TM: a corpus for assessing transparency of clinical trial protocol and results publications.

Sci Data. 2025 Feb 28;12(1):355. doi: 10.1038/s41597-025-04629-1.

SPIRIT-CONSORT-TM: a corpus for assessing transparency of clinical trial protocol and results publications.

medRxiv. 2025 Jan 15:2025.01.14.25320543. doi: 10.1101/2025.01.14.25320543.

Semi-supervised learning from small annotated data and large unlabeled data for fine-grained Participants, Intervention, Comparison, and Outcomes entity recognition.

J Am Med Inform Assoc. 2025 Mar 1;32(3):555-565. doi: 10.1093/jamia/ocae326.

Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations.

Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023:9871-9889. doi: 10.18653/v1/2023.acl-long.549.

本文引用的文献

Aggregating and Predicting Sequence Labels from Crowd Annotations.

Proc Conf Assoc Comput Linguist Meet. 2017;2017:299-309. doi: 10.18653/v1/P17-1028.

Automating Biomedical Evidence Synthesis: RobotReviewer.

Proc Conf Assoc Comput Linguist Meet. 2017 Jul;2017:7-12. doi: 10.18653/v1/P17-4002.

Living systematic reviews: 2. Combining human and machine effort.

J Clin Epidemiol. 2017 Nov;91:31-37. doi: 10.1016/j.jclinepi.2017.08.011. Epub 2017 Sep 11.

An exploration of crowdsourcing citation screening for systematic reviews.

Res Synth Methods. 2017 Sep;8(3):366-386. doi: 10.1002/jrsm.1252. Epub 2017 Jul 4.

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

J Am Med Inform Assoc. 2017 Nov 1;24(6):1165-1168. doi: 10.1093/jamia/ocx053.

Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry.

BMJ Open. 2017 Feb 27;7(2):e012545. doi: 10.1136/bmjopen-2016-012545.

Extracting PICO Sentences from Clinical Trial Reports using .

J Mach Learn Res. 2016;17.

A corpus of potentially contradictory research claims from cardiovascular research abstracts.

J Biomed Semantics. 2016 Jun 7;7:36. doi: 10.1186/s13326-016-0083-z.

Automating data extraction in systematic reviews: a systematic review.

Syst Rev. 2015 Jun 15;4:78. doi: 10.1186/s13643-015-0066-7.

Modernizing the systematic review process to inform comparative effectiveness: tools and methods.

J Comp Eff Res. 2013 May;2(3):273-82. doi: 10.2217/cer.13.17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一个带有患者、干预措施和结果的多层次注释的语料库，以支持医学文献的语言处理。

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.

作者信息

Nye Benjamin, Jessy Li Junyi, Patel Roma, Yang Yinfei, Marshall Iain J, Nenkova Ani, Wallace Byron C

机构信息

Northeastern University,

UT Austin,

出版信息

Proc Conf Assoc Comput Linguist Meet. 2018 Jul;2018:197-207.

PMID:30305770

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6174533/

Abstract

摘要

一个带有患者、干预措施和结果的多层次注释的语料库，以支持医学文献的语言处理。

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一个带有患者、干预措施和结果的多层次注释的语料库，以支持医学文献的语言处理。

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.

作者信息

机构信息

出版信息