Wu ChienHsing, Kao Shu-Chen, Shih Chia-Hung, Kan Meng-Hsuan
Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung 81148, Taiwan, ROC.
Department of Information Management, Kun Shan University, 195, Kunda Rd., YongKang Dist., Tainan, Taiwan, ROC.
Acta Trop. 2018 Jul;183:1-7. doi: 10.1016/j.actatropica.2018.03.017. Epub 2018 Mar 13.
By using a quantitative approach, this study examines the applicability of data mining technique to discover knowledge from open data related to Taiwan's dengue epidemic. We compare results when Google trend data are included or excluded. Data sources are government open data, climate data, and Google trend data. Research findings from analysis of 70,914 cases are obtained. Location and time (month) in open data show the highest classification power followed by climate variables (temperature and humidity), whereas gender and age show the lowest values. Both prediction accuracy and simplicity decrease when Google trends are considered (respectively 0.94 and 0.37, compared to 0.96 and 0.46). The article demonstrates the value of open data mining in the context of public health care.
本研究采用定量方法,检验数据挖掘技术从与台湾登革热疫情相关的开放数据中发现知识的适用性。我们比较了纳入或排除谷歌趋势数据时的结果。数据来源包括政府开放数据、气候数据和谷歌趋势数据。通过对70914个病例的分析得出了研究结果。开放数据中的地点和时间(月份)显示出最高的分类能力,其次是气候变量(温度和湿度),而性别和年龄的分类能力最低。考虑谷歌趋势时,预测准确性和简易性均有所下降(分别为0.94和0.37,而不考虑谷歌趋势时为0.96和0.46)。本文展示了公共卫生保健背景下开放数据挖掘的价值。