Suppr超能文献

课程知识的大学图谱。

A university map of course knowledge.

机构信息

Graduate School of Education, University of California, Berkeley, California, United States of America.

Department of Psychology, Stanford University, Stanford, California, United States of America.

出版信息

PLoS One. 2020 Sep 30;15(9):e0233207. doi: 10.1371/journal.pone.0233207. eCollection 2020.

Abstract

Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the text descriptions of elements onto this behaviorally informed representation. In this study, we use the course enrollment histories of 124,000 students at a public university to learn vector representations of its courses. From these course selection informed representations, a notable 88% of course attribute information was recovered, as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Honors Math 1B as Physics 7B is to Honors Physics 7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions via regression. We find that representations learned from enrollment histories resolved courses to a level of semantic fidelity exceeding that of their catalog descriptions, revealing nuanced content differences between similar courses, as well as accurately describing departments the dataset had no course descriptions for. We end with a discussion of the possible mechanisms by which this semantic structure may be informed and implications for the nascent research and practice of data science.

摘要

知识表示已经变得越来越重要,因为行为的普遍数字化产生了大量数据,学术界和工业界都在寻求理解和推理这些信息的方法。在这方面的追求中取得了成功,从自然语言中获得的数据中出现了可理解的关系结构,这些结构也成为了人类学感兴趣的人工制品。然而,自然语言只是大数据洪流的一部分。在这里,我们表明行为数据可以提供潜在的语义结构,并且可以通过可视化和将元素的文本描述映射到这种行为信息表示的新方法从这种结构中提取领域知识。在这项研究中,我们使用一所公立大学的 124000 名学生的课程注册历史来学习其课程的向量表示。从这些基于课程选择的信息表示中,恢复了 88%的课程属性信息,以及 40%的基于先前领域知识构建的课程关系,并通过类比进行了评估(例如,Math 1B 与 Honors Math 1B 相对应,Physics 7B 与 Honors Physics 7B 相对应)。为了帮助解释学习到的结构,我们创建了一个语义插值,通过回归将课程向量转换为各自目录描述的词袋。我们发现,从注册历史中学习到的表示将课程解析到语义保真度的水平超过了其目录描述,揭示了相似课程之间细微的内容差异,并准确描述了数据集没有课程描述的部门。最后,我们讨论了这种语义结构可能被通知的可能机制,以及对新兴的数据科学研究和实践的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db8f/7526902/fede8615f374/pone.0233207.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验