Tenorio-Fornés Ámbar, Arroyo Javier, Hassan Samer
Decentralized Science, Decentralized Academy Ltd., Madrid, Spain.
Instituto de Tecnología del Conocimiento, Universidad Complutense de Madrid, Madrid, Spain.
PeerJ Comput Sci. 2022 Jan 3;8:e792. doi: 10.7717/peerj-cs.792. eCollection 2022.
Peer production online communities are groups of people that collaboratively engage in the building of common resources such as wikis and open source projects. In such communities, participation is highly unequal: few people concentrate the majority of the workload, while the rest provide irregular and sporadic contributions. The distribution of participation is typically characterized as a power law distribution. However, recent statistical studies on empirical data have challenged the power law dominance in other domains. This work critically examines the assumption that the distribution of participation in wikis follows such distribution. We use statistical tools to analyse over 6,000 wikis from Wikia/Fandom, the largest wiki repository. We study the empirical distribution of each wiki comparing it with different well-known skewed distributions. The results show that the power law performs poorly, surpassed by three others with a more moderated heavy-tail behavior. In particular, the truncated power law is superior to all competing distributions, or superior to some and as good as the rest, in 99.3% of the cases. These findings have implications that can inform a better modeling of participation in peer production, and help to produce more accurate predictions of the tail behavior, which represents the activity and frequency of the core contributors. Thus, we propose to consider the truncated power law as the distribution to characterize participation distribution in wiki communities. Furthermore, the truncated power law parameters provide a meaningful interpretation to characterize the community in terms of the frequency of participation of occasional contributors and how unequal are the group of core contributors. Finally, we found a relationship between the parameters and the productivity of the community and its size. These results open research venues for the characterization of communities in wikis and in online peer production.
对等生产在线社区是指一群人协作构建诸如维基百科和开源项目等公共资源的群体。在这类社区中,参与度极不均衡:少数人承担了大部分工作量,而其余人则提供不定期且零散的贡献。参与度的分布通常被描述为幂律分布。然而,最近对实证数据的统计研究对幂律在其他领域的主导地位提出了挑战。这项工作批判性地审视了维基百科参与度分布遵循这种分布的假设。我们使用统计工具分析了来自最大的维基存储库Wikia/Fandom的6000多个维基百科。我们研究每个维基百科的实证分布,并将其与不同的著名偏态分布进行比较。结果表明,幂律表现不佳,被另外三种具有更适度重尾行为的分布所超越。特别是,截断幂律在99.3%的情况下优于所有竞争分布,或者优于某些分布且与其他分布相当。这些发现具有一定的意义,可为对等生产中的参与度提供更好的建模,并有助于对代表核心贡献者活动和频率的尾部行为做出更准确的预测。因此,我们建议将截断幂律视为描述维基社区参与度分布的分布。此外,截断幂律参数提供了一种有意义的解释,可根据偶尔贡献者的参与频率以及核心贡献者群体的不均衡程度来描述社区。最后,我们发现了这些参数与社区生产力及其规模之间的关系。这些结果为维基百科和在线对等生产中社区的特征描述开辟了研究途径。