DMI Department, University of Palermo, Palermo, Italy.
MIFT Department, University of Messina, Messina, Italy.
PLoS One. 2021 Aug 11;16(8):e0255067. doi: 10.1371/journal.pone.0255067. eCollection 2021.
Data collected in criminal investigations may suffer from issues like: (i) incompleteness, due to the covert nature of criminal organizations; (ii) incorrectness, caused by either unintentional data collection errors or intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyze nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data, and to determine which network type is most affected by it. The networks are firstly pruned using two specific methods: (i) random edge removal, simulating the scenario in which the Law Enforcement Agencies fail to intercept some calls, or to spot sporadic meetings among suspects; (ii) node removal, modeling the situation in which some suspects cannot be intercepted or investigated. Finally we compute spectral distances (i.e., Adjacency, Laplacian and normalized Laplacian Spectral Distances) and matrix distances (i.e., Root Euclidean Distance) between the complete and pruned networks, which we compare using statistical analysis. Our investigation identifies two main features: first, the overall understanding of the criminal networks remains high even with incomplete data on criminal interactions (i.e., when 10% of edges are removed); second, removing even a small fraction of suspects not investigated (i.e., 2% of nodes are removed) may lead to significant misinterpretation of the overall network.
(i)不完整性,由于犯罪组织的隐蔽性质;(ii)不正确性,可能是由于无意的数据收集错误或犯罪分子的故意欺骗;(iii)不一致性,当相同的信息被多次收集到执法数据库中,或采用不同的格式。在本文中,我们分析了九个不同性质的真实犯罪网络(即黑手党网络、犯罪街头帮派和恐怖组织),以量化不完整数据的影响,并确定哪种网络类型受其影响最大。首先使用两种特定的方法对网络进行修剪:(i)随机边删除,模拟执法机构未能拦截一些电话或发现嫌疑人之间零星会议的情况;(ii)节点删除,模拟一些嫌疑人无法被拦截或调查的情况。最后,我们计算完整网络和修剪网络之间的谱距离(即邻接、拉普拉斯和归一化拉普拉斯谱距离)和矩阵距离(即根欧几里得距离),并使用统计分析进行比较。我们的调查确定了两个主要特征:首先,即使对犯罪交互的不完全数据(即删除 10%的边)也可以保持对犯罪网络的整体理解;其次,即使删除一小部分未被调查的嫌疑人(即删除 2%的节点)也可能导致对整个网络的严重误解。