Popović Marko, Štefančić Hrvoje, Sluban Borut, Kralj Novak Petra, Grčar Miha, Mozetič Igor, Puliga Michelangelo, Zlatić Vinko
Theoretical Physics Division, Rudjer Bošković Institute, P.O.Box 180, HR-10002, Zagreb, Croatia.
Theoretical Physics Division, Rudjer Bošković Institute, P.O.Box 180, HR-10002, Zagreb, Croatia; Catholic University of Croatia, Zagreb, Croatia.
PLoS One. 2014 Dec 3;9(12):e99515. doi: 10.1371/journal.pone.0099515. eCollection 2014.
A stream of unstructured news can be a valuable source of hidden relations between different entities, such as financial institutions, countries, or persons. We present an approach to continuously collect online news, recognize relevant entities in them, and extract time-varying networks. The nodes of the network are the entities, and the links are their co-occurrences. We present a method to estimate the significance of co-occurrences, and a benchmark model against which their robustness is evaluated. The approach is applied to a large set of financial news, collected over a period of two years. The entities we consider are 50 countries which issue sovereign bonds, and which are insured by Credit Default Swaps (CDS) in turn. We compare the country co-occurrence networks to the CDS networks constructed from the correlations between the CDS. The results show relatively small, but significant overlap between the networks extracted from the news and those from the CDS correlations.
源源不断的非结构化新闻可能是不同实体(如金融机构、国家或个人)之间隐藏关系的宝贵来源。我们提出了一种方法,用于持续收集在线新闻、识别其中的相关实体并提取随时间变化的网络。网络的节点是实体,链接是它们的共现情况。我们提出了一种估计共现重要性的方法,以及一个用于评估其稳健性的基准模型。该方法应用于在两年时间内收集的大量金融新闻。我们考虑的实体是50个发行主权债券且又由信用违约互换(CDS)承保的国家。我们将国家共现网络与根据CDS之间的相关性构建的CDS网络进行比较。结果表明,从新闻中提取的网络与从CDS相关性中提取的网络之间存在相对较小但显著的重叠。