Yamada Kenta, Takayasu Hideki, Takayasu Misako
Institute of Innovative Research, Tokyo Institute of Technology, 4259, Nagatsuta-cho, Yokohama 226-8502, Japan.
Sony Computer Science Laboratories, 3-14-13, Higashi-Gotanda, Shinagawa-ku, Tokyo 141-0022, Japan.
Entropy (Basel). 2018 Nov 6;20(11):852. doi: 10.3390/e20110852.
We introduce a systematic method to estimate an economic indicator from the Japanese government by analyzing big Japanese blog data. Explanatory variables are monthly word frequencies. We adopt 1352 words in the section of economics and industry of the Nikkei thesaurus for each candidate word to illustrate the economic index. From this large volume of words, our method automatically selects the words which have strong correlation with the economic indicator and resolves some difficulties in statistics such as the spurious correlation and overfitting. As a result, our model reasonably illustrates the real economy index. The announcement of an economic index from government usually has a time lag, while our proposed method can be real time.
我们介绍一种通过分析大量日本博客数据来估计日本政府经济指标的系统方法。解释变量为每月的词汇出现频率。为了阐释经济指标,我们从日经同义词词典的经济与产业部分选取1352个单词作为每个候选词。基于这些大量的词汇,我们的方法自动选择与经济指标具有强相关性的词汇,并解决了一些统计方面的难题,如虚假相关性和过度拟合问题。结果,我们的模型合理地阐释了实际经济指标。政府公布经济指标通常存在时间滞后,而我们提出的方法可以实现实时性。