Suppr超能文献

2023年美国医学信息学会(AMIA)年会期间举办的8项健康应用社交媒体挖掘(#SMM4H)共享任务概述。

Overview of the 8 Social Media Mining for Health Applications (#SMM4H) Shared Tasks at the AMIA 2023 Annual Symposium.

作者信息

Klein Ari Z, Banda Juan M, Guo Yuting, Schmidt Ana Lucia, Xu Dongfang, Amaro Jesus Ivan Flores, Rodriguez-Esteban Raul, Sarker Abeed, Gonzalez-Hernandez Graciela

机构信息

Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.

Department of Computer Science, Georgia State University, Atlanta, GA, USA.

出版信息

medRxiv. 2023 Nov 8:2023.11.06.23298168. doi: 10.1101/2023.11.06.23298168.

Abstract

The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of five tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). In total, 29 teams registered, representing 18 countries. In this paper, we present the annotated corpora, a technical summary of the systems, and the performance results. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. To facilitate future work, the datasets-a total of 61,353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.

摘要

健康应用社交媒体挖掘(#SMM4H)共享任务的目标是采用社区驱动的方法,应对将社交媒体数据用于健康信息学所固有的自然语言处理和机器学习挑战。#SMM4H共享任务的第八次迭代在2023年美国医学信息学会(AMIA)年会上举办,包括五项任务,涵盖了各种社交媒体平台(推特和Reddit)、语言(英语和西班牙语)、方法(二元分类、多类分类、提取和归一化)以及主题(新冠疫情、治疗方法、社交焦虑症和药物不良事件)。共有29个团队注册参赛,代表了18个国家。在本文中,我们展示了带注释的语料库、系统的技术总结以及性能结果。总体而言,表现最佳的系统使用了基于预训练变压器模型的深度神经网络架构。特别是,分类任务中表现最佳的系统基于在社交媒体语料库上进行预训练的单一模型。为便于未来开展工作,总共61353条帖子的数据集将可应要求提供,并且CodaLab网站将在评估后阶段保持活跃。

相似文献

本文引用的文献

2
An aspect-level sentiment analysis dataset for therapies on Twitter.一个用于推特上疗法的方面级情感分析数据集。
Data Brief. 2023 Sep 23;50:109618. doi: 10.1016/j.dib.2023.109618. eCollection 2023 Oct.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验