特定任务的信息优于预测分析中的监测式大数据。

Task-specific information outperforms surveillance-style big data in predictive analytics.

机构信息

Department of Economics, University of Copenhagen, 1353 Copenhagen, Denmark.

Center for Social Data Science, University of Copenhagen, 1353 Copenhagen, Denmark.

出版信息

Proc Natl Acad Sci U S A. 2021 Apr 6;118(14). doi: 10.1073/pnas.2020258118.

DOI:10.1073/pnas.2020258118

PMID:33790010

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8040817/

Abstract

Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19-induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students' privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacy-invasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with "ground truth" administrative registry data can ideally allow the identification of privacy-preserving task-specific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.

摘要

越来越多的人类行为可以通过从数字设备中收集数据来监测，这些数据揭示了行为和位置的信息。在高等教育背景下，越来越多的学校和大学收集学生数据，目的是评估或预测学生的行为和学习成绩，而 COVID-19 引发的在线教育转变极大地增加了可以以这种方式积累的数据，这引发了人们对学生隐私的担忧。我们专注于学习成绩，并探讨是否可以使用侵犯隐私程度较低但更具体任务的数据来实现对给定数据集的预测性能。我们借鉴了一个关于大量学生群体的独特数据集，其中包含行为和个性的高度详细度量以及高质量的第三方报告的个人层面的行政数据。我们发现，使用大数据集估计的模型确实能够准确地预测样本外的学习成绩。然而，仅使用低维且可以说侵犯隐私程度较低的行政数据的模型表现要好得多，而且重要的是，当我们添加高分辨率、侵犯隐私的行为数据时，模型并不会得到改善。我们认为，将大数据行为数据与“真实数据”行政登记数据相结合，可以理想地识别出可替代当前无差别行为数据的隐私保护特定任务的特征，从而实现更好的隐私保护和更好的预测效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5be/8040817/6553fce0c601/pnas.2020258118fig01.jpg

相似文献

Task-specific information outperforms surveillance-style big data in predictive analytics.特定任务的信息优于预测分析中的监测式大数据。

Proc Natl Acad Sci U S A. 2021 Apr 6;118(14). doi: 10.1073/pnas.2020258118.

Association between medical students' prior experiences and perceptions of formal online education developed in response to COVID-19: a cross-sectional study in China.中国一项横断面研究显示，医学生先前经历与对新冠疫情期间开发的正式在线教育的看法之间存在关联。

BMJ Open. 2020 Oct 29;10(10):e041886. doi: 10.1136/bmjopen-2020-041886.

The Influence of Coronavirus Diseases 2019 (COVID-19) Pandemic and the Quarantine Practices on University Students' Beliefs About the Online Learning Experience in Jordan.2019 年冠状病毒病（COVID-19）大流行和隔离措施对约旦大学生对在线学习体验的信念的影响。

Front Public Health. 2021 Jan 13;8:595874. doi: 10.3389/fpubh.2020.595874. eCollection 2020.

Students' and lecturers' perspective on the implementation of online learning in dental education due to SARS-CoV-2 (COVID-19): a cross-sectional study.学生和讲师对 SARS-CoV-2（COVID-19）期间口腔医学教育中实施在线学习的看法：一项横断面研究。

BMC Med Educ. 2020 Oct 9;20(1):354. doi: 10.1186/s12909-020-02266-3.

Containing COVID-19 Among 627,386 Persons in Contact With the Diamond Princess Cruise Ship Passengers Who Disembarked in Taiwan: Big Data Analytics.对627386名与在台湾下船的钻石公主号邮轮乘客有接触者中新冠病毒感染情况的大数据分析

J Med Internet Res. 2020 May 5;22(5):e19540. doi: 10.2196/19540.

Distance learning in clinical medical education amid COVID-19 pandemic in Jordan: current situation, challenges, and perspectives.新冠疫情期间约旦临床医学教育的远程学习：现状、挑战与展望。

BMC Med Educ. 2020 Oct 2;20(1):341. doi: 10.1186/s12909-020-02257-4.

Predicting student outcomes using digital logs of learning behaviors: Review, current standards, and suggestions for future work.利用学习行为的数字日志预测学生成绩：综述、当前标准及未来工作建议。

Behav Res Methods. 2023 Sep;55(6):3026-3054. doi: 10.3758/s13428-022-01939-9. Epub 2022 Aug 26.

Student Attitudes toward Learning Analytics in Higher Education: ".高等教育中学生对学习分析的态度：“

Front Psychol. 2016 Dec 19;7:1959. doi: 10.3389/fpsyg.2016.01959. eCollection 2016.

The sudden transition to synchronized online learning during the COVID-19 pandemic in Saudi Arabia: a qualitative study exploring medical students' perspectives.沙特阿拉伯在 COVID-19 大流行期间突然转向同步在线学习：一项探索医学生观点的定性研究。

BMC Med Educ. 2020 Aug 28;20(1):285. doi: 10.1186/s12909-020-02208-z.

Attitudes and concerns of undergraduate university health sciences students in Croatia regarding complete switch to e-learning during COVID-19 pandemic: a survey.克罗地亚大学生健康科学专业学生对在 COVID-19 大流行期间完全转向电子学习的态度和担忧：一项调查。

BMC Med Educ. 2020 Nov 10;20(1):416. doi: 10.1186/s12909-020-02343-7.

引用本文的文献

The origins of unpredictability in life outcome prediction tasks.生命结果预测任务中不可预测性的起源。

Proc Natl Acad Sci U S A. 2024 Jun 11;121(24):e2322973121. doi: 10.1073/pnas.2322973121. Epub 2024 Jun 4.

Student and teacher performance during COVID-19 lockdown: An investigation of associated features and complex interactions using multiple data sources.COVID-19 封锁期间的学生和教师表现：使用多种数据源调查相关特征和复杂交互作用。

PLoS One. 2023 Oct 25;18(10):e0291689. doi: 10.1371/journal.pone.0291689. eCollection 2023.

Testing Thermostatic Bath End-Scale Stability for Calibration Performance with a Multiple-Sensor Ensemble Using ARIMA, Temporal Stochastics and a Quantum Walker Algorithm.利用 ARIMA、时间随机过程和量子游走算法的多传感器集成对恒温槽端标稳定性进行测试，以评估其校准性能。

Sensors (Basel). 2023 Feb 17;23(4):2267. doi: 10.3390/s23042267.

本文引用的文献

The Negative Effect of Smartphone Use on Academic Performance May Be Overestimated: Evidence From a 2-Year Panel Study.智能手机使用对学业成绩的负面影响可能被高估：来自一项为期两年的面板研究的证据。

Psychol Sci. 2020 Nov;31(11):1351-1362. doi: 10.1177/0956797620956613. Epub 2020 Oct 6.

Interaction data from the Copenhagen Networks Study.哥本哈根网络研究的交互数据。

Sci Data. 2019 Dec 11;6(1):315. doi: 10.1038/s41597-019-0325-x.

Privacy and data protection in learning analytics should be motivated by an educational maxim-towards a proposal.学习分析中的隐私和数据保护应以一条教育准则为出发点——形成一项提议。

Res Pract Technol Enhanc Learn. 2018;13(1):20. doi: 10.1186/s41039-018-0086-8. Epub 2018 Dec 11.

Class attendance, peer similarity, and academic performance in a large field study.一项大型实地研究中的课堂出勤率、同伴相似度与学业成绩

PLoS One. 2017 Nov 8;12(11):e0187078. doi: 10.1371/journal.pone.0187078. eCollection 2017.

A Social Media Based Index of Mental Well-Being in College Campuses.一种基于社交媒体的大学校园心理健康指数。

Proc SIGCHI Conf Hum Factor Comput Syst. 2017 May;2017:1634-1646. doi: 10.1145/3025453.3025909.

Predicting poverty and wealth from mobile phone metadata.从手机元数据预测贫困与富裕。

Science. 2015 Nov 27;350(6264):1073-6. doi: 10.1126/science.aac4420.

Measuring large-scale social networks with high resolution.以高分辨率测量大规模社会网络。

PLoS One. 2014 Apr 25;9(4):e95978. doi: 10.1371/journal.pone.0095978. eCollection 2014.

Private traits and attributes are predictable from digital records of human behavior.个人特质和属性可从人类行为的数字记录中预测出来。

Proc Natl Acad Sci U S A. 2013 Apr 9;110(15):5802-5. doi: 10.1073/pnas.1218772110. Epub 2013 Mar 11.

Social science. Computational social science.社会科学。计算社会科学。

Science. 2009 Feb 6;323(5915):721-3. doi: 10.1126/science.1167742.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

特定任务的信息优于预测分析中的监测式大数据。

Task-specific information outperforms surveillance-style big data in predictive analytics.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献