Suppr超能文献

为 SUBTLEX-US 词频添加词性信息。

Adding part-of-speech information to the SUBTLEX-US word frequencies.

机构信息

Department of Experimental Psychology, Ghent University, Henri Dunantlaan 2, 9000, Gent, Belgium.

出版信息

Behav Res Methods. 2012 Dec;44(4):991-7. doi: 10.3758/s13428-012-0190-4.

Abstract

The SUBTLEX-US corpus has been parsed with the CLAWS tagger, so that researchers have information about the possible word classes (parts-of-speech, or PoSs) of the entries. Five new columns have been added to the SUBTLEX-US word frequency list: the dominant (most frequent) PoS for the entry, the frequency of the dominant PoS, the frequency of the dominant PoS relative to the entry's total frequency, all PoSs observed for the entry, and the respective frequencies of these PoSs. Because the current definition of lemma frequency does not seem to provide word recognition researchers with useful information (as illustrated by a comparison of the lemma frequencies and the word form frequencies from the Corpus of Contemporary American English), we have not provided a column with this variable. Instead, we hope that the full list of PoS frequencies will help researchers to collectively determine which combination of frequencies is the most informative.

摘要

SUBTLEX-US 语料库已经使用 CLAWS 标签器进行了分析,以便研究人员能够获得有关条目的可能词性(词类,或 PoS)的信息。SUBTLEX-US 单词频率列表中添加了五个新列:条目的主导(最常见)PoS、主导 PoS 的频率、主导 PoS 相对于条目的总频率的频率、为该条目观察到的所有 PoS 及其各自的频率。由于词元频率的当前定义似乎没有为单词识别研究人员提供有用的信息(如当代美国英语语料库中的词元频率与单词形式频率的比较所示),因此我们没有提供包含此变量的列。相反,我们希望完整的 PoS 频率列表将帮助研究人员共同确定哪种频率组合最具信息量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验