Suppr超能文献

2017年至2021年患者报告的糖尿病相关推文显式和隐式因果关系提取:深度学习方法

Extraction of Explicit and Implicit Cause-Effect Relationships in Patient-Reported Diabetes-Related Tweets From 2017 to 2021: Deep Learning Approach.

作者信息

Ahne Adrian, Khetan Vivek, Tannier Xavier, Rizvi Md Imbesat Hassan, Czernichow Thomas, Orchard Francisco, Bour Charline, Fano Andrew, Fagherazzi Guy

机构信息

Center of Epidemiology and Population Health, Inserm, Hospital Gustave Roussy, Paris-Saclay University, Villejuif, France.

Epiconcept Company, Paris, France.

出版信息

JMIR Med Inform. 2022 Jul 19;10(7):e37201. doi: 10.2196/37201.

Abstract

BACKGROUND

Intervening in and preventing diabetes distress requires an understanding of its causes and, in particular, from a patient's perspective. Social media data provide direct access to how patients see and understand their disease and consequently show the causes of diabetes distress.

OBJECTIVE

Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect relationships in patient-reported diabetes-related tweets and provide a methodology to better understand the opinions, feelings, and observations shared within the diabetes online community from a causality perspective.

METHODS

More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect tweet data set was manually labeled and used to train (1) a fine-tuned BERTweet model to detect causal sentences containing a causal relation and (2) a conditional random field model with Bidirectional Encoder Representations from Transformers (BERT)-based features to extract possible cause-effect associations. Causes and effects were clustered in a semisupervised approach and visualized in an interactive cause-effect network.

RESULTS

Causal sentences were detected with a recall of 68% in an imbalanced data set. A conditional random field model with BERT-based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect relationships. "Diabetes" was identified as the central cluster followed by "death" and "insulin." Insulin pricing-related causes were frequently associated with death.

CONCLUSIONS

A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multiword cause, and the corresponding effect, as expressed in diabetes-related tweets leveraging BERT-based architectures and visualized as cause-effect network. Extracting causal associations in real life, patient-reported outcomes in social media data provide a useful complementary source of information in diabetes research.

摘要

背景

干预和预防糖尿病困扰需要了解其成因,特别是从患者的角度。社交媒体数据能直接反映患者如何看待和理解自己的疾病,从而揭示糖尿病困扰的成因。

目的

利用机器学习方法,我们旨在提取患者发布的与糖尿病相关的推文中的显性和隐性因果关系,并提供一种方法,从因果关系的角度更好地理解糖尿病在线社区中分享的观点、感受和观察结果。

方法

在2017年4月至2021年1月期间收集了超过3000万条英文糖尿病相关推文。应用深度学习和自然语言处理方法,重点关注包含个人情感内容的推文。人工标注了一个因果关系推文数据集,用于训练(1)一个微调的BERTweet模型,以检测包含因果关系的因果句子;(2)一个基于变换器双向编码器表示(BERT)特征的条件随机场模型,以提取可能的因果关联。采用半监督方法对原因和结果进行聚类,并在交互式因果网络中进行可视化。

结果

在不平衡数据集中检测到因果句子的召回率为68%。基于BERT特征的条件随机场模型在因果关系检测方面优于微调的BERT模型,宏召回率为68%。这导致了96676个具有因果关系的句子。“糖尿病”被确定为中心聚类,其次是“死亡”和“胰岛素”。与胰岛素定价相关的原因经常与死亡相关。

结论

开发了一种新颖的方法来检测因果句子,并识别显性和隐性、单个和多个词的原因以及相应的结果,这些在利用基于BERT的架构的糖尿病相关推文中有所体现,并可视化为因果网络。在现实生活中提取因果关联,社交媒体数据中患者报告的结果为糖尿病研究提供了有用的补充信息来源。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验