Mowery Jared
The MITRE Corporation.
Online J Public Health Inform. 2016 Dec 28;8(3):e198. doi: 10.5210/ojphi.v8i3.7011. eCollection 2016.
Influenza (flu) surveillance using Twitter data can potentially save lives and increase efficiency by providing governments and healthcare organizations with greater situational awareness. However, research is needed to determine the impact of Twitter users' misdiagnoses on surveillance
This study establishes the importance of Twitter users' misdiagnoses by showing that Twitter flu surveillance in the United States failed during the 2011-2012 flu season, estimates the extent of misdiagnoses, and tests several methods for reducing the adverse effects of misdiagnoses.
Metrics representing flu prevalence, seasonal misdiagnosis patterns, diagnosis uncertainty, flu symptoms, and noise were produced using Twitter data in conjunction with OpenSextant for geo-inferencing, and a maximum entropy classifier for identifying tweets related to illness. These metrics were tested for correlations with World Health Organization (WHO) positive specimen counts of flu from 2011 to 2014.
Twitter flu surveillance erroneously indicated a typical flu season during 2011-2012, even though the flu season peaked three months late, and erroneously indicated plateaus of flu tweets before the 2012-2013 and 2013-2014 flu seasons. Enhancements based on estimates of misdiagnoses removed the erroneous plateaus and increased the Pearson correlation coefficients by .04 and .23, but failed to correct the 2011-2012 flu season estimate. A rough estimate indicates that approximately 40% of flu tweets reflected misdiagnoses.
Further research into factors affecting Twitter users' misdiagnoses, in conjunction with data from additional atypical flu seasons, is needed to enable Twitter flu surveillance systems to produce reliable estimates during atypical flu seasons.
利用推特数据进行流感监测,可为政府和医疗组织提供更强的态势感知,从而有可能挽救生命并提高效率。然而,需要开展研究来确定推特用户误诊对监测的影响。
本研究通过表明美国2011 - 2012流感季期间推特流感监测失败,来确定推特用户误诊的重要性,估计误诊程度,并测试几种减少误诊负面影响的方法。
结合使用推特数据、用于地理推理的OpenSextant以及用于识别与疾病相关推文的最大熵分类器,生成代表流感流行率、季节性误诊模式、诊断不确定性、流感症状和噪音的指标。对这些指标与世界卫生组织(WHO)2011年至2014年流感阳性样本计数的相关性进行测试。
推特流感监测错误地表明2011 - 2012年是典型的流感季,尽管该流感季高峰推迟了三个月,并且在2012 - 2013年和2013 - 2014年流感季之前错误地显示流感推文处于平稳期。基于误诊估计的改进消除了错误的平稳期,使皮尔逊相关系数分别提高了0.04和0.23,但未能纠正2011 - 2012年流感季的估计。粗略估计表明,约40%的流感推文反映了误诊。
需要进一步研究影响推特用户误诊的因素,并结合其他非典型流感季的数据,以使推特流感监测系统能够在非典型流感季产生可靠的估计。