Medical Informatics, Kaiser Permanente Southern California, San Diego, CA, 92130, USA.
BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):79. doi: 10.1186/s12911-019-0785-0.
Twitter messages (tweets) contain various types of topics in our daily life, which include health-related topics. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily lives. In this paper we evaluate an approach to extracting causalities from tweets using natural language processing (NLP) techniques.
Lexico-syntactic patterns based on dependency parser outputs are used for causality extraction. We focused on three health-related topics: "stress", "insomnia", and "headache." A large dataset consisting of 24 million tweets are used.
The results show the proposed approach achieved an average precision between 74.59 to 92.27% in comparisons with human annotations.
Manual analysis on extracted causalities in tweets reveals interesting findings about expressions on health-related topic posted by Twitter users.
推特消息(推文)包含日常生活中的各种主题,包括与健康相关的主题。分析与健康相关的推文有助于我们了解日常生活中遇到的健康状况和关注点。在本文中,我们评估了一种使用自然语言处理(NLP)技术从推文中提取因果关系的方法。
基于依存解析器输出的词汇句法模式用于因果关系提取。我们专注于三个与健康相关的主题:“压力”、“失眠”和“头痛”。使用了一个包含 2400 万条推文的大型数据集。
结果表明,与人工注释相比,所提出的方法在比较中平均精度在 74.59%到 92.27%之间。
对从推文中提取的因果关系进行手动分析揭示了有关推特用户发布的与健康相关主题的表达的有趣发现。