• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

《儿童图画书词汇表》(CPB-LEX):一个来自儿童图画书的大规模词汇数据库。

The Children's Picture Books Lexicon (CPB-LEX): A large-scale lexical database from children's picture books.

机构信息

Faculty of Education, University of Hong Kong, Pok Fu Lam, Hong Kong.

Senior Lecturer, Centre for Smart Analytics & Institute of Innovation, Science and Sustainability, Federation University Australia, Mount Helen, Australia.

出版信息

Behav Res Methods. 2024 Aug;56(5):4504-4521. doi: 10.3758/s13428-023-02198-y. Epub 2023 Aug 11.

DOI:10.3758/s13428-023-02198-y
PMID:37566336
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11289352/
Abstract

This article presents CPB-LEX, a large-scale database of lexical statistics derived from children's picture books (age range 0-8 years). Such a database is essential for research in psychology, education and computational modelling, where rich details on the vocabulary of early print exposure are required. CPB-LEX was built through an innovative method of computationally extracting lexical information from automatic speech-to-text captions and subtitle tracks generated from social media channels dedicated to reading picture books aloud. It consists of approximately 25,585 types (wordforms) and their frequency norms (raw and Zipf-transformed), a lexicon of bigrams (two-word sequences and their transitional probabilities) and a document-term matrix (which shows the importance of each word in the corpus in each book). Several immediate contributions of CPB-LEX to behavioural science research are reported, including that the new CPB-LEX frequency norms strongly predict age of acquisition and outperform comparable child-input lexical databases. The database allows researchers and practitioners to extract lexical statistics for high-frequency words which can be used to develop word lists. The paper concludes with an investigation of how CPB-LEX can be used to extend recent modelling research on the lexical diversity children receive from picture books in addition to child-directed speech. Our model shows that the vocabulary input from a relatively small number of picture books can dramatically enrich vocabulary exposure from child-directed speech and potentially assist children with vocabulary input deficits. The database is freely available from the Open Science Framework repository: https://tinyurl.com/4este73c .

摘要

本文介绍了 CPB-LEX,这是一个从儿童图画书中提取词汇统计数据的大型数据库(年龄范围为 0-8 岁)。对于心理学、教育和计算建模等领域的研究来说,这样的数据库是必不可少的,因为这些领域需要有关早期印刷品接触词汇的丰富细节。CPB-LEX 是通过一种创新的方法从社交媒体渠道上专门用于大声朗读图画书的自动语音转文本字幕和副标题轨道中提取词汇信息而构建的。它由大约 25585 个类型(词形)及其频率规范(原始和 Zipf 转换)、一个双词序列(两个词的序列及其过渡概率)的词汇和一个文档-术语矩阵(显示每个词在语料库中在每本书中的重要性)组成。本文报告了 CPB-LEX 对行为科学研究的几个直接贡献,包括新的 CPB-LEX 频率规范强烈预测习得年龄,并优于可比的儿童输入词汇数据库。该数据库允许研究人员和从业者提取高频词的词汇统计信息,可用于开发词表。本文最后探讨了如何使用 CPB-LEX 来扩展最近关于儿童从图画书和儿童导向语言中获得词汇多样性的建模研究。我们的模型表明,从相对较少的图画书中输入的词汇可以极大地丰富来自儿童导向语言的词汇输入,并可能有助于词汇输入不足的儿童。该数据库可从开放科学框架存储库中免费获得:https://tinyurl.com/4este73c 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/372cca4ace8e/13428_2023_2198_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/7790aa3c77d2/13428_2023_2198_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/26cde1fbbe14/13428_2023_2198_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/85a82d69eaf7/13428_2023_2198_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/f4c37ecde76c/13428_2023_2198_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/f7c949b05128/13428_2023_2198_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/a0cbbca1d0ed/13428_2023_2198_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/372cca4ace8e/13428_2023_2198_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/7790aa3c77d2/13428_2023_2198_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/26cde1fbbe14/13428_2023_2198_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/85a82d69eaf7/13428_2023_2198_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/f4c37ecde76c/13428_2023_2198_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/f7c949b05128/13428_2023_2198_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/a0cbbca1d0ed/13428_2023_2198_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b58/11289352/372cca4ace8e/13428_2023_2198_Fig7_HTML.jpg

相似文献

1
The Children's Picture Books Lexicon (CPB-LEX): A large-scale lexical database from children's picture books.《儿童图画书词汇表》(CPB-LEX):一个来自儿童图画书的大规模词汇数据库。
Behav Res Methods. 2024 Aug;56(5):4504-4521. doi: 10.3758/s13428-023-02198-y. Epub 2023 Aug 11.
2
The Children and Young People's Books Lexicon (CYP-LEX): A large-scale lexical database of books read by children and young people in the United Kingdom.《儿童与青少年书籍词汇表》(CYP-LEX):一个大规模的词汇数据库,收录了英国儿童和青少年阅读的书籍。
Q J Exp Psychol (Hove). 2024 Dec;77(12):2418-2438. doi: 10.1177/17470218241229694. Epub 2024 Mar 12.
3
The Words Children Hear: Picture Books and the Statistics for Language Learning.孩子们听到的词汇:图画书与语言学习统计数据
Psychol Sci. 2015 Sep;26(9):1489-96. doi: 10.1177/0956797615594361. Epub 2015 Aug 4.
4
Children's early reading vocabulary: description and word frequency lists.儿童早期阅读词汇:描述与词频列表
Br J Educ Psychol. 2003 Dec;73(Pt 4):585-98. doi: 10.1348/000709903322591253.
5
CCLOWW: A grade-level Chinese children's lexicon of written words.CCLOWW:一个中文儿童书面词汇的年级水平词库。
Behav Res Methods. 2023 Jun;55(4):1874-1889. doi: 10.3758/s13428-022-01890-9. Epub 2022 Jul 1.
6
NSP-SCD: A corpus construction protocol for child-directed print in understudied languages.NSP-SCD:面向欠研究语言的面向儿童的印刷品语料库构建协议。
Behav Res Methods. 2024 Apr;56(4):2751-2764. doi: 10.3758/s13428-024-02339-x. Epub 2024 Feb 15.
7
The role of lexical and prosodic characteristics of mothers' child-directed speech for the early vocabulary development of Italian children with cochlear implants.母亲对儿童指向性言语的词汇和韵律特征对植入人工耳蜗的意大利儿童早期词汇发展的作用。
Int J Lang Commun Disord. 2024 Nov-Dec;59(6):2367-2382. doi: 10.1111/1460-6984.13087. Epub 2024 Jul 8.
8
Maternal input to children with specific language impairment during shared book reading: is mothers' language in tune with their children's production?母亲在与特定语言障碍儿童共同阅读时的输入:母亲的语言是否与孩子的产出相协调?
Int J Lang Commun Disord. 2014 Mar-Apr;49(2):204-14. doi: 10.1111/1460-6984.12062. Epub 2013 Nov 13.
9
CCLOOW: Chinese children's lexicon of oral words.儿童汉语词汇表
Behav Res Methods. 2024 Feb;56(2):846-859. doi: 10.3758/s13428-023-02077-6. Epub 2023 Mar 7.
10
Mothers used wider vocabulary and talked to their six-month-old infants more during shared book reading than when they played with toys.母亲在与 6 个月大的婴儿共同阅读书籍时比玩玩具时使用了更广泛的词汇并与他们进行了更多的交流。
Acta Paediatr. 2024 Jan;113(1):84-90. doi: 10.1111/apa.17004. Epub 2023 Oct 20.

引用本文的文献

1
Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data.奇波拉:一个用于捕捉口语细微差别和预测行为数据的中文播客词汇数据库。
Behav Res Methods. 2025 May 8;57(6):166. doi: 10.3758/s13428-025-02697-0.
2
Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.超越基于计数的词频:人工智能生成的单词和短语熟悉度估计是语言知识的一个有趣的附加指标。
Behav Res Methods. 2024 Dec 28;57(1):28. doi: 10.3758/s13428-024-02561-7.
3
Should We Stop Using Lexical Diversity Measures in Children's Language Sample Analysis?

本文引用的文献

1
Bilingual children's visual attention while reading digital picture books and story retelling.双语儿童在阅读数字图画书和故事复述时的视觉注意力
J Exp Child Psychol. 2022 Mar;215:105327. doi: 10.1016/j.jecp.2021.105327. Epub 2021 Dec 8.
2
Word prevalence norms for 62,000 English lemmas.62000 个英语词汇的词频规范。
Behav Res Methods. 2019 Apr;51(2):467-479. doi: 10.3758/s13428-018-1077-9.
3
Ending the Reading Wars: Reading Acquisition From Novice to Expert.终结阅读之争:新手到专家的阅读习得。
是否应该停止在儿童语言样本分析中使用词汇多样性测量?
Am J Speech Lang Pathol. 2024 Jul 3;33(4):1986-2001. doi: 10.1044/2024_AJSLP-23-00457. Epub 2024 Jun 5.
4
The Children and Young People's Books Lexicon (CYP-LEX): A large-scale lexical database of books read by children and young people in the United Kingdom.《儿童与青少年书籍词汇表》(CYP-LEX):一个大规模的词汇数据库,收录了英国儿童和青少年阅读的书籍。
Q J Exp Psychol (Hove). 2024 Dec;77(12):2418-2438. doi: 10.1177/17470218241229694. Epub 2024 Mar 12.
Psychol Sci Public Interest. 2018 Jun;19(1):5-51. doi: 10.1177/1529100618772271.
4
Quantity and Diversity: Simulating Early Word Learning Environments.数量与多样性:模拟早期词汇学习环境。
Cogn Sci. 2018 May;42 Suppl 2(Suppl 2):375-412. doi: 10.1111/cogs.12592. Epub 2018 Feb 7.
5
Test-based age-of-acquisition norms for 44 thousand English word meanings.基于测试的44000个英语词义的习得年龄规范。
Behav Res Methods. 2017 Aug;49(4):1520-1523. doi: 10.3758/s13428-016-0811-4.
6
HelexKids: A word frequency database for Greek and Cypriot primary school children.HelexKids:一个针对希腊和塞浦路斯小学生的词频数据库。
Behav Res Methods. 2017 Feb;49(1):83-96. doi: 10.3758/s13428-015-0698-5.
7
The Words Children Hear: Picture Books and the Statistics for Language Learning.孩子们听到的词汇:图画书与语言学习统计数据
Psychol Sci. 2015 Sep;26(9):1489-96. doi: 10.1177/0956797615594361. Epub 2015 Aug 4.
8
Receptive vocabulary differences in monolingual and bilingual children.单语和双语儿童的接受性词汇差异。
Biling (Camb Engl). 2010 Oct;13(4):525-531. doi: 10.1017/S1366728909990423.
9
childLex: a lexical database of German read by children.儿童德语语料库:儿童朗读的德语词汇数据库。
Behav Res Methods. 2015 Dec;47(4):1085-1094. doi: 10.3758/s13428-014-0528-1.
10
ESCOLEX: a grade-level lexical database from European Portuguese elementary to middle school textbooks.ESCOLEX:一个源自欧洲葡萄牙小学到中学课本的词汇级数据库。
Behav Res Methods. 2014 Mar;46(1):240-53. doi: 10.3758/s13428-013-0350-1.