Suppr超能文献

K-SPAN:一个韩语表面语音形式和音韵邻接密度统计的词汇数据库。

K-SPAN: A lexical database of Korean surface phonetic forms and phonological neighborhood density statistics.

机构信息

Department of Korean Language and Literature, Korea University, 145 Anam-ro Seongbuk-gu, Seoul, 02841, South Korea.

Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS), Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University, 29, rue d'Ulm, 75005, Paris, France.

出版信息

Behav Res Methods. 2017 Oct;49(5):1939-1950. doi: 10.3758/s13428-016-0836-8.

Abstract

This article presents K-SPAN (Korean Surface Phonetics and Neighborhoods), a database of surface phonetic forms and several measures of phonological neighborhood density for 63,836 Korean words. Currently publicly available Korean corpora are limited by the fact that they only provide orthographic representations in Hangeul, which is problematic since phonetic forms in Korean cannot be reliably predicted from orthographic forms. We describe the method used to derive the surface phonetic forms from a publicly available orthographic corpus of Korean, and report on several statistics calculated using this database; namely, segment unigram frequencies, which are compared to previously reported results, along with segment-based and syllable-based neighborhood density statistics for three types of representation: an "orthographic" form, which is a quasi-phonological representation, a "conservative" form, which maintains all known contrasts, and a "modern" form, which represents the pronunciation of contemporary Seoul Korean. These representations are rendered in an ASCII-encoded scheme, which allows users to query the corpus without having to read Korean orthography, and permits the calculation of a wide range of phonological measures.

摘要

本文介绍了 K-SPAN(韩语表面语音和音近词),这是一个包含 63836 个韩语单词的表面语音形式和几个语音近音密度度量的数据库。目前可用的韩语语料库存在一个局限性,即它们只提供韩语的韩文字符拼写形式,这是有问题的,因为韩语的语音形式不能从拼写形式可靠地预测。我们描述了从一个公开的韩语正字法语料库中推导出表面语音形式的方法,并报告了使用该数据库计算的几个统计数据;即,语素的一元频率,与之前报告的结果进行比较,以及基于段和基于音节的三种表示形式的近音密度统计数据:“正字法”形式,这是一种准语音表示形式,“保守”形式,它保留了所有已知的对比,以及“现代”形式,它代表了当代首尔韩语的发音。这些表示形式采用 ASCII 编码方案呈现,允许用户查询语料库,而无需阅读韩语正字法,并允许计算广泛的语音度量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验