• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

课程知识的大学图谱。

A university map of course knowledge.

机构信息

Graduate School of Education, University of California, Berkeley, California, United States of America.

Department of Psychology, Stanford University, Stanford, California, United States of America.

出版信息

PLoS One. 2020 Sep 30;15(9):e0233207. doi: 10.1371/journal.pone.0233207. eCollection 2020.

DOI:10.1371/journal.pone.0233207
PMID:32997664
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7526902/
Abstract

Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the text descriptions of elements onto this behaviorally informed representation. In this study, we use the course enrollment histories of 124,000 students at a public university to learn vector representations of its courses. From these course selection informed representations, a notable 88% of course attribute information was recovered, as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Honors Math 1B as Physics 7B is to Honors Physics 7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions via regression. We find that representations learned from enrollment histories resolved courses to a level of semantic fidelity exceeding that of their catalog descriptions, revealing nuanced content differences between similar courses, as well as accurately describing departments the dataset had no course descriptions for. We end with a discussion of the possible mechanisms by which this semantic structure may be informed and implications for the nascent research and practice of data science.

摘要

知识表示已经变得越来越重要,因为行为的普遍数字化产生了大量数据,学术界和工业界都在寻求理解和推理这些信息的方法。在这方面的追求中取得了成功,从自然语言中获得的数据中出现了可理解的关系结构,这些结构也成为了人类学感兴趣的人工制品。然而,自然语言只是大数据洪流的一部分。在这里,我们表明行为数据可以提供潜在的语义结构,并且可以通过可视化和将元素的文本描述映射到这种行为信息表示的新方法从这种结构中提取领域知识。在这项研究中,我们使用一所公立大学的 124000 名学生的课程注册历史来学习其课程的向量表示。从这些基于课程选择的信息表示中,恢复了 88%的课程属性信息,以及 40%的基于先前领域知识构建的课程关系,并通过类比进行了评估(例如,Math 1B 与 Honors Math 1B 相对应,Physics 7B 与 Honors Physics 7B 相对应)。为了帮助解释学习到的结构,我们创建了一个语义插值,通过回归将课程向量转换为各自目录描述的词袋。我们发现,从注册历史中学习到的表示将课程解析到语义保真度的水平超过了其目录描述,揭示了相似课程之间细微的内容差异,并准确描述了数据集没有课程描述的部门。最后,我们讨论了这种语义结构可能被通知的可能机制,以及对新兴的数据科学研究和实践的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/ac7c7299d2be/pone.0233207.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/fede8615f374/pone.0233207.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/d1a625d36a87/pone.0233207.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/d1c3ff5ffdd0/pone.0233207.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/1bc896b655fe/pone.0233207.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/274cf864d848/pone.0233207.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/9fa0c1f59f0a/pone.0233207.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/ac7c7299d2be/pone.0233207.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/fede8615f374/pone.0233207.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/d1a625d36a87/pone.0233207.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/d1c3ff5ffdd0/pone.0233207.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/1bc896b655fe/pone.0233207.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/274cf864d848/pone.0233207.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/9fa0c1f59f0a/pone.0233207.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/ac7c7299d2be/pone.0233207.g007.jpg

相似文献

1
A university map of course knowledge.课程知识的大学图谱。
PLoS One. 2020 Sep 30;15(9):e0233207. doi: 10.1371/journal.pone.0233207. eCollection 2020.
2
Structure and deterioration of semantic memory: a neuropsychological and computational investigation.语义记忆的结构与衰退:一项神经心理学与计算研究。
Psychol Rev. 2004 Jan;111(1):205-35. doi: 10.1037/0033-295X.111.1.205.
3
Jointly learning word embeddings using a corpus and a knowledge base.联合使用语料库和知识库学习词向量。
PLoS One. 2018 Mar 12;13(3):e0193094. doi: 10.1371/journal.pone.0193094. eCollection 2018.
4
Incorporating linguistic knowledge for learning distributed word representations.整合语言知识以学习分布式词表示。
PLoS One. 2015 Apr 13;10(4):e0118437. doi: 10.1371/journal.pone.0118437. eCollection 2015.
5
Emergence of analogy from relation learning.从关系学习中涌现的类比。
Proc Natl Acad Sci U S A. 2019 Mar 5;116(10):4176-4181. doi: 10.1073/pnas.1814779116. Epub 2019 Feb 15.
6
Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation.利用数据驱动的子语言模式挖掘来诱导知识模型:在医学图像报告知识表示中的应用。
BMC Med Inform Decis Mak. 2018 Jul 6;18(1):61. doi: 10.1186/s12911-018-0645-3.
7
Newly-acquired words are more phonologically robust in verbal short-term memory when they have associated semantic representations.当新习得的单词具有相关语义表征时,它们在言语短期记忆中在语音方面更稳固。
Neuropsychologia. 2017 Apr;98:85-97. doi: 10.1016/j.neuropsychologia.2016.03.006. Epub 2016 Mar 8.
8
Dependency-based Siamese long short-term memory network for learning sentence representations.基于依赖的孪生长短时记忆网络用于学习句子表示。
PLoS One. 2018 Mar 7;13(3):e0193919. doi: 10.1371/journal.pone.0193919. eCollection 2018.
9
Mathematical modeling and mining real-world Big education datasets with application to curriculum mapping.数学建模与挖掘真实世界的大型教育数据集及其在课程映射中的应用。
Math Biosci Eng. 2021 May 24;18(4):4450-4460. doi: 10.3934/mbe.2021225.
10
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction.知识作者:促进用户驱动的领域内容开发,以支持临床信息提取。
J Biomed Semantics. 2016 Jun 23;7(1):42. doi: 10.1186/s13326-016-0086-9.

引用本文的文献

1
Interdisciplinary college curriculum and its labor market implications.跨学科大学课程及其对劳动力市场的影响。
Proc Natl Acad Sci U S A. 2023 Oct 24;120(43):e2221915120. doi: 10.1073/pnas.2221915120. Epub 2023 Oct 16.
2
The impact of timetable on student's absences and performance.时间表对学生缺勤和表现的影响。
PLoS One. 2021 Jun 25;16(6):e0253256. doi: 10.1371/journal.pone.0253256. eCollection 2021.

本文引用的文献

1
Unsupervised word embeddings capture latent knowledge from materials science literature.无监督词嵌入方法可以从材料科学文献中提取潜在知识。
Nature. 2019 Jul;571(7763):95-98. doi: 10.1038/s41586-019-1335-8. Epub 2019 Jul 3.
2
Adaptive Mixtures of Local Experts.局部专家的自适应混合模型
Neural Comput. 1991 Spring;3(1):79-87. doi: 10.1162/neco.1991.3.1.79.
3
Word embeddings quantify 100 years of gender and ethnic stereotypes.词嵌入量化了 100 年来的性别和种族刻板印象。
Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3635-E3644. doi: 10.1073/pnas.1720347115. Epub 2018 Apr 3.
4
Science and data science.科学与数据科学。
Proc Natl Acad Sci U S A. 2017 Aug 15;114(33):8689-8692. doi: 10.1073/pnas.1702076114. Epub 2017 Aug 7.
5
Semantics derived automatically from language corpora contain human-like biases.从语言语料库中自动推导出来的语义包含类人偏见。
Science. 2017 Apr 14;356(6334):183-186. doi: 10.1126/science.aal4230.
6
DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.深筹码:单人无限注德州扑克中的专家级人工智能。
Science. 2017 May 5;356(6337):508-513. doi: 10.1126/science.aam6960. Epub 2017 Mar 2.
7
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
8
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
9
Clickstream data yields high-resolution maps of science.点击流数据生成了科学的高分辨率地图。
PLoS One. 2009;4(3):e4803. doi: 10.1371/journal.pone.0004803. Epub 2009 Mar 11.
10
Maps of random walks on complex networks reveal community structure.复杂网络上随机游走的图谱揭示了群落结构。
Proc Natl Acad Sci U S A. 2008 Jan 29;105(4):1118-23. doi: 10.1073/pnas.0706851105. Epub 2008 Jan 23.