• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用频率守恒上下文模型识别缺失的词典条目。

Identifying missing dictionary entries with frequency-conserving context models.

作者信息

Williams Jake Ryland, Clark Eric M, Bagrow James P, Danforth Christopher M, Dodds Peter Sheridan

机构信息

Department of Mathematics & Statistics, Vermont Complex Systems Center, Computational Story Lab, and The Vermont Advanced Computing Core, The University of Vermont, Burlington, Vermont 05401, USA.

出版信息

Phys Rev E Stat Nonlin Soft Matter Phys. 2015 Oct;92(4):042808. doi: 10.1103/PhysRevE.92.042808. Epub 2015 Oct 12.

DOI:10.1103/PhysRevE.92.042808
PMID:26565290
Abstract

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with the Wiktionary, an extensive, online, collaborative, and open-source dictionary that contains over 100000 phrasal definitions, we develop highly effective filters for the identification of meaningful, missing phrase entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique and expanding our knowledge of the defined English lexicon of phrases.

摘要

为了更好地从自然语言文本中理解语义,我们探索旨在将词汇对象组织到语境中的方法。许多这些组织方法属于由词序定义的类别。与数据的人口统计学或空间划分不同,这些搭配模型因其普遍适用性而具有特殊重要性。虽然我们这里关注的是文本并进行了适当的处理,但我们的工作可能适用于其他研究领域(例如语音、基因组学和移动模式),在这些领域中存在有序的分类数据(例如声音、基因和位置)。我们的方法将短语(无论是单词还是更长的短语)作为主要的承载语义的词汇单元和研究对象。为此,我们采用先前开发的框架来生成保留单词的短语频率数据。在用维基词典(一个包含超过100000个短语定义的广泛、在线、协作和开源词典)训练我们的模型后,我们开发了高效的过滤器来识别有意义的、缺失的短语条目。通过我们的预测,我们随后与维基词典的编辑社区合作,提出潜在缺失条目的简短列表以供定义,开发了一种突破性的词汇提取技术,并扩展了我们对已定义的英语短语词汇的认识。

相似文献

1
Identifying missing dictionary entries with frequency-conserving context models.使用频率守恒上下文模型识别缺失的词典条目。
Phys Rev E Stat Nonlin Soft Matter Phys. 2015 Oct;92(4):042808. doi: 10.1103/PhysRevE.92.042808. Epub 2015 Oct 12.
2
Early language development in children with profound hearing loss fitted with a device at a young age: part II--content of the first lexicon.自幼佩戴助听设备的极重度听力损失儿童的早期语言发展:第二部分——首个词汇库的内容
Ear Hear. 2009 Oct;30(5):541-51. doi: 10.1097/AUD.0b013e3181aa00ea.
3
Seeing a phrase "time and again" matters: the role of phrasal frequency in the processing of multiword sequences.反复看到一个短语很重要:短语频率在多词序列处理中的作用。
J Exp Psychol Learn Mem Cogn. 2011 May;37(3):776-84. doi: 10.1037/a0022531.
4
A simple error classification system for understanding sources of error in automatic speech recognition and human transcription.一个用于理解自动语音识别和人工转录中错误来源的简单错误分类系统。
Int J Med Inform. 2004 Sep;73(9-10):719-30. doi: 10.1016/j.ijmedinf.2004.05.008.
5
The role of semantic diversity in lexical organization.语义多样性在词汇组织中的作用。
Can J Exp Psychol. 2012 Jun;66(2):115-24. doi: 10.1037/a0026727.
6
Phonological underspecification and mapping mechanisms in the speech recognition lexicon.语音识别词汇表中的音系特征不充分说明与映射机制
Brain Lang. 2004 Jul-Sep;90(1-3):401-12. doi: 10.1016/S0093-934X(03)00451-6.
7
Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States.美国东部地区遥感气溶胶光学厚度与PM2.5之间关系的评估及统计建模
Res Rep Health Eff Inst. 2012 May(167):5-83; discussion 85-91.
8
The role of the striatum in sentence processing: evidence from a priming study in early stages of Huntington's disease.纹状体在句子处理中的作用:来自亨廷顿舞蹈症早期启动研究的证据。
Neuropsychologia. 2008 Jan 15;46(1):174-85. doi: 10.1016/j.neuropsychologia.2007.07.022. Epub 2007 Aug 3.
9
The influence of word function in the missing-letter effect: further evidence from French.单词功能在缺字母效应中的影响:来自法语的进一步证据。
Mem Cognit. 1997 Sep;25(5):665-76.
10
How has the impact of 'care pathway technologies' on service integration in stroke care been measured and what is the strength of the evidence to support their effectiveness in this respect?“护理路径技术”对卒中护理服务整合的影响是如何衡量的,以及有哪些证据支持其在这方面的有效性?
Int J Evid Based Healthc. 2008 Mar;6(1):78-110. doi: 10.1111/j.1744-1609.2007.00098.x.

引用本文的文献

1
The Lexicocalorimeter: Gauging public health through caloric input and output on social media.词汇热量计:通过社交媒体上的热量输入与输出衡量公众健康。
PLoS One. 2017 Feb 10;12(2):e0168893. doi: 10.1371/journal.pone.0168893. eCollection 2017.