Sung Yao-Ting, Chang Tao-Hsing, Lin Wei-Chun, Hsieh Kuan-Sheng, Chang Kuo-En
Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei, Taiwan.
Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, 415, Jiangong Road, Kaohsiung City, 80778, Taiwan, Republic of China.
Behav Res Methods. 2016 Dec;48(4):1238-1251. doi: 10.3758/s13428-015-0649-1.
Textual analysis has been applied to various fields, such as discourse analysis, corpus studies, text leveling, and automated essay evaluation. Several tools have been developed for analyzing texts written in alphabetic languages such as English and Spanish. However, currently there is no tool available for analyzing Chinese-language texts. This article introduces a tool for the automated analysis of simplified and traditional Chinese texts, called the Chinese Readability Index Explorer (CRIE). Composed of four subsystems and incorporating 82 multilevel linguistic features, CRIE is able to conduct the major tasks of segmentation, syntactic parsing, and feature extraction. Furthermore, the integration of linguistic features with machine learning models enables CRIE to provide leveling and diagnostic information for texts in language arts, texts for learning Chinese as a foreign language, and texts with domain knowledge. The usage and validation of the functions provided by CRIE are also introduced.
文本分析已应用于各个领域,如话语分析、语料库研究、文本分级和自动作文评估。已经开发了几种工具来分析用字母语言(如英语和西班牙语)编写的文本。然而,目前还没有可用于分析中文文本的工具。本文介绍了一种用于自动分析简体中文和繁体中文文本的工具,称为中文可读性指数浏览器(CRIE)。CRIE由四个子系统组成,包含82个多层次语言特征,能够执行分词、句法分析和特征提取等主要任务。此外,语言特征与机器学习模型的集成使CRIE能够为语言艺术文本、对外汉语学习文本和具有领域知识的文本提供分级和诊断信息。还介绍了CRIE所提供功能的用法和验证。