Suppr超能文献

人口活动数据的升尺度处理:一种统计生态学方法。

Upscaling human activity data: A statistical ecology approach.

机构信息

Dipartimento di Fisica e Astronomia "Galileo Galilei", Istituto Nazionale di Fisica Nucleare, Università degli Studi di Padova, Padova, Italy.

Dipartimento di Matematica "Tullio Levi-Civita", Università degli Studi di Padova, Padova, Italy.

出版信息

PLoS One. 2021 Jul 1;16(7):e0253461. doi: 10.1371/journal.pone.0253461. eCollection 2021.

Abstract

Big data require new techniques to handle the information they come with. Here we consider four datasets (email communication, Twitter posts, Wikipedia articles and Gutenberg books) and propose a novel statistical framework to predict global statistics from random samples. More precisely, we infer the number of senders, hashtags and words of the whole dataset and how their abundances (i.e. the popularity of a hashtag) change through scales from a small sample of sent emails per sender, posts per hashtag and word occurrences. Our approach is grounded on statistical ecology as we map inference of human activities into the unseen species problem in biodiversity. Our findings may have applications to resource management in emails, collective attention monitoring in Twitter and language learning process in word databases.

摘要

大数据需要新的技术来处理其所带来的信息。在这里,我们考虑了四个数据集(电子邮件通信、Twitter 帖子、维基百科文章和古腾堡书籍),并提出了一个新颖的统计框架,以便从随机样本中预测全局统计数据。更准确地说,我们从每个发件人发送的少量电子邮件、每个标签的帖子和单词出现次数中,推断出整个数据集的发件人数量、标签和单词数量,以及它们的丰度(即标签的流行度)如何随尺度变化。我们的方法基于统计生态学,因为我们将人类活动的推断映射到生物多样性中看不见的物种问题中。我们的发现可能适用于电子邮件中的资源管理、Twitter 中的集体注意力监测以及单词数据库中的语言学习过程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0892/8248688/86d345606dbf/pone.0253461.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验