Suppr超能文献

利用搜索查询欺骗性和广义脊回归估计流感发病率。

Estimating influenza incidence using search query deceptiveness and generalized ridge regression.

机构信息

Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

University of Colorado Boulder, Boulder, Colorado, United States of America.

出版信息

PLoS Comput Biol. 2019 Oct 1;15(10):e1007165. doi: 10.1371/journal.pcbi.1007165. eCollection 2019 Oct.

Abstract

Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.

摘要

季节性流感是一种有时影响巨大的疾病,每年导致数千人死亡,并导致更多的发病率。及时了解疫情状况对于有效应对非常有价值。目前的方法是通过与患者进行面对面接触来获取这些知识。虽然这种方法准确,但耗时且昂贵。这促使人们研究使用互联网活动痕迹的新方法,其理论依据是,对健康状况的非专业观察会导致互联网数据中出现有意义的特征。这些方法存在被与疾病发病率巧合相关而非有意义相关的活动痕迹所欺骗的风险;据我们所知,这种风险尚未得到定量探讨。我们使用线性回归评估了具有不同欺骗性的模拟和真实活动痕迹,以用于流感发病率估计。我们发现,欺骗性知识确实可以降低此类估计的误差,它可以帮助自动选择的特征与需要人工策展的特征一样或更好地发挥作用,并且从维基百科文章类别树派生的语义距离度量可以作为欺骗性的有用代理。这表明,疾病发病率估计模型不仅应该包含有关互联网特征与发病率之间映射关系的数据,还应该包含其他数据来估计特征的欺骗性。通过这样做,我们可以在使用互联网数据进行准确、可靠的疾病发病率估计方面更进一步。这种能力将通过降低成本和提高此类估计的及时性来改善公共卫生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b16/6771994/b439aff3be92/pcbi.1007165.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验