Siri Anna, Khabbache Hicham, Al-Jafar Ali, Martini Mariano, Brigo Francesco, Bragazzi Nicola Luigi
Department of Mathematics (DIMA), University of Genoa, Genoa, Italy; UNESCO CHAIR "Anthropology of Health - Biosphere and Healing System", University of Genoa, Genoa, Italy.
Laboratoire Etudes théologiques, Sciences Cognitives et Sociales, Faculty of Literature and Humanistic Studies, Sais, Sidi Mohamed Ben Abdellah University, Fez, Morocco.
Data Brief. 2016 Oct 13;9:679-684. doi: 10.1016/j.dib.2016.09.032. eCollection 2016 Dec.
The present data article describes high-school drop-out related web activities in Canada, from 2004 to 2012, obtained mining Google Trends (GT), using high-school drop-out as key-word. The searches volumes were processed, correlated and cross-correlated with statistical data obtained at national and province level and broken down for gender. Further, an autoregressive moving-average (ARMA) model was used to model the GT-generated data. From a qualitative point of view, GT-generated relative search volumes (RSVs) reflect the decrease in drop-out rate. The peak in the Internet-related activities occurs in 2004 (56.35%, normalized value), and gradually declines to 40.59% (normalized value) in 2007. After, it remains substantially stable until 2012 (40.32%, normalized value). From a quantitative standpoint, the correlations between Canadian high-school drop-out rate and GT-generated RSVs in the study period (2004-2012) were statistically significant both using the drop-out rate for academic year and the 3-years moving average. Examining the data broken down by gender, the correlations were higher and statistically significant in males than in females. GT-based data for drop-out resulted best modeled by an ARMA(1,0) model. Considering the cross correlation of Canadian regions, all of them resulted statistically significant at lag 0, apart from for New Brunswick, Newfoundland and Labrador and the Prince Edward island. A number or cross-correlations resulted statistically significant also at lag -1 (namely, Alberta, Manitoba, New Brunswick and Saskatchewan).
本数据文章描述了2004年至2012年加拿大与高中辍学相关的网络活动,这些数据是通过以“高中辍学”为关键词挖掘谷歌趋势(GT)获得的。搜索量经过处理,与国家和省级层面的统计数据进行了相关性和交叉相关性分析,并按性别进行了分类。此外,还使用自回归移动平均(ARMA)模型对GT生成的数据进行建模。从定性的角度来看,GT生成的相对搜索量(RSV)反映了辍学率的下降。与互联网相关的活动在2004年达到峰值(归一化值为56.35%),并在2007年逐渐下降至40.59%(归一化值)。此后,直到2012年(归一化值为40.32%),它基本保持稳定。从定量的角度来看,在研究期间(2004 - 2012年),使用学年辍学率和三年移动平均值时,加拿大高中辍学率与GT生成的RSV之间的相关性均具有统计学意义。按性别对数据进行分析时,男性的相关性高于女性且具有统计学意义。基于GT的辍学数据用ARMA(1,0)模型建模效果最佳。考虑到加拿大各地区的交叉相关性,除新不伦瑞克省、纽芬兰和拉布拉多省以及爱德华王子岛外,所有地区在滞后0时的相关性均具有统计学意义。在滞后 -1时,一些交叉相关性也具有统计学意义(即艾伯塔省、马尼托巴省、新不伦瑞克省和萨斯喀彻温省)。