Suppr超能文献

针对新冠肺炎与卫生新闻信息提取的卫生人力资源:算法开发与验证

Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation.

作者信息

Ravaut Mathieu, Zhao Ruochen, Phung Duy, Qin Vicky Mengqi, Milovanovic Dusan, Pienkowska Anita, Bojic Iva, Car Josip, Joty Shafiq

机构信息

Nanyang Technological University, Singapore, Singapore.

Episteme Systems, Geneva, Switzerland.

出版信息

JMIR AI. 2024 Oct 30;3:e55059. doi: 10.2196/55059.

Abstract

BACKGROUND

Global pandemics like COVID-19 put a high amount of strain on health care systems and health workers worldwide. These crises generate a vast amount of news information published online across the globe. This extensive corpus of articles has the potential to provide valuable insights into the nature of ongoing events and guide interventions and policies. However, the sheer volume of information is beyond the capacity of human experts to process and analyze effectively.

OBJECTIVE

The aim of this study was to explore how natural language processing (NLP) can be leveraged to build a system that allows for quick analysis of a high volume of news articles. Along with this, the objective was to create a workflow comprising human-computer symbiosis to derive valuable insights to support health workforce strategic policy dialogue, advocacy, and decision-making.

METHODS

We conducted a review of open-source news coverage from January 2020 to June 2022 on COVID-19 and its impacts on the health workforce from the World Health Organization (WHO) Epidemic Intelligence from Open Sources (EIOS) by synergizing NLP models, including classification and extractive summarization, and human-generated analyses. Our DeepCovid system was trained on 2.8 million news articles in English from more than 3000 internet sources across hundreds of jurisdictions.

RESULTS

Rules-based classification with hand-designed rules narrowed the data set to 8508 articles with high relevancy confirmed in the human-led evaluation. DeepCovid's automated information targeting component reached a very strong binary classification performance of 98.98 for the area under the receiver operating characteristic curve (ROC-AUC) and 47.21 for the area under the precision recall curve (PR-AUC). Its information extraction component attained good performance in automatic extractive summarization with a mean Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 47.76. DeepCovid's final summaries were used by human experts to write reports on the COVID-19 pandemic.

CONCLUSIONS

It is feasible to synergize high-performing NLP models and human-generated analyses to benefit open-source health workforce intelligence. The DeepCovid approach can contribute to an agile and timely global view, providing complementary information to scientific literature.

摘要

背景

像新冠疫情这样的全球大流行给全球的医疗系统和医护人员带来了巨大压力。这些危机在全球范围内产生了大量在线发布的新闻信息。这一庞大的文章语料库有可能为正在发生的事件的性质提供有价值的见解,并指导干预措施和政策制定。然而,信息的数量之多超出了人类专家有效处理和分析的能力。

目的

本研究的目的是探索如何利用自然语言处理(NLP)来构建一个系统,以便能够快速分析大量新闻文章。与此同时,目标是创建一个包含人机共生的工作流程,以获得有价值的见解,支持卫生人力战略政策对话、宣传和决策。

方法

我们通过整合NLP模型(包括分类和提取式摘要)以及人工分析,对2020年1月至2022年6月来自世界卫生组织(WHO)开源疫情情报(EIOS)的关于新冠疫情及其对卫生人力影响的开源新闻报道进行了综述。我们的DeepCovid系统在来自数百个司法管辖区的3000多个互联网来源的280万篇英文新闻文章上进行了训练。

结果

基于手工设计规则的基于规则的分类将数据集缩小到8508篇文章,在人工主导的评估中确认具有高度相关性。DeepCovid的自动信息定位组件在接收器操作特征曲线(ROC-AUC)下的区域达到了98.98的非常强的二元分类性能,在精确召回曲线(PR-AUC)下的区域达到了47.21。其信息提取组件在自动提取式摘要方面表现良好,平均面向召回的摘要评估(ROUGE)分数为47.76。DeepCovid的最终摘要被人类专家用于撰写关于新冠疫情的报告。

结论

整合高性能NLP模型和人工分析以受益于开源卫生人力情报是可行的。DeepCovid方法可以促成敏捷及时的全球视角,为科学文献提供补充信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c6/11561429/e2513503e033/ai_v3i1e55059_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验