School of Informatics and Computing, Indiana University, Bloomington, IN 47403, USA.
Proc Natl Acad Sci U S A. 2010 Dec 28;107(52):22436-41. doi: 10.1073/pnas.1006155107. Epub 2010 Dec 8.
We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and temporal proximity of the co-occurrences? Such issues arise in data originating in both online and offline domains as well as settings that capture interfaces between online and offline behavior. Here we develop a framework for quantifying the answers to such questions, and we apply this framework to publicly available data from a social media site, finding that even a very small number of co-occurrences can result in a high empirical likelihood of a social tie. We then present probabilistic models showing how such large probabilities can arise from a natural model of proximity and co-occurrence in the presence of social ties. In addition to providing a method for establishing some of the first quantifiable estimates of these measures, our findings have potential privacy implications, particularly for the ways in which social structures can be inferred from public online records that capture individuals' physical locations over time.
如果两个人在多次、大约相同的地理位置、大约相同的时间出现,他们认识彼此的可能性有多大?此外,这种可能性如何取决于共同出现的时空接近程度?这些问题出现在在线和离线数据以及捕捉在线和离线行为之间接口的设置中。在这里,我们开发了一个框架来量化这些问题的答案,并将该框架应用于社交媒体网站上的公开数据,结果发现,即使只有很少的几次共同出现,也会导致社会关系的实际可能性很高。然后,我们提出了概率模型,展示了在存在社会关系的情况下,如何从接近度和共同出现的自然模型中产生如此高的概率。除了提供一种方法来确定这些度量标准的第一批可量化估计值之外,我们的发现还具有潜在的隐私影响,特别是对于可以从捕捉个人随时间变化的物理位置的公共在线记录中推断社会结构的方式。