通过公民科学数据收集、软件开发和机器学习来开发一种数字故事复述启发与分析工具。

The development of a digital story-retell elicitation and analysis tool through citizen science data collection, software development and machine learning.

作者信息

Bright Rebecca, Ashton Elaine, Mckean Cristina, Wren Yvonne

机构信息

Therapy Box, London, United Kingdom.

School of Education, Communication and Language Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.

出版信息

Front Psychol. 2023 Apr 20;14:989499. doi: 10.3389/fpsyg.2023.989499. eCollection 2023.

DOI:10.3389/fpsyg.2023.989499

PMID:37287780

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10243469/

Abstract

BACKGROUND

In order to leverage the potential benefits of technology to speech and language therapy language assessment processes, large samples of naturalistic language data must be collected and analysed. These samples enable the development and testing of novel software applications with data relevant to their intended clinical application. However, the collection and analysis of such data can be costly and time-consuming. This paper describes the development of a novel application designed to elicit and analyse young children's story retell narratives to provide metrics regarding the child's use of grammatical structures (micro-structure) and story grammar (macro-structure elements). Key aspects for development were (1) methods to collect story retells, ensure accurate transcription and segmentation of utterances; (2) testing the reliability of the application to analyse micro-structure elements in children's story retells and (3) development of an algorithm to analyse narrative macro-structure elements.

METHODS

A co-design process was used to design an app which would be used to gather story retell samples from children using mobile technology. A citizen science approach using mainstream marketing online channels, the media and billboard ads was used to encourage participation from children across the United Kingdom. A stratified sampling framework was used to ensure a representative sample was obtained across age, gender and five bands of socio-economic disadvantage using partial postcodes and the relevant indices of deprivation. Trained Research Associates (RA) completed transcription and micro and macro-structure analysis of the language samples. Methods to improve transcriptions produced by automated speech recognition were developed to enable reliable analysis. RA micro-structure analyses were compared to those generated by the digital application to test its reliability using intra-class correlation (ICC). RA macro-structure analyses were used to train an algorithm to produce macro-structure metrics. Finally, results from the macro-structure algorithm were compared against a subset of RA macro-structure analyses not used in training to test its reliability using ICC.

RESULTS

A total of 4,517 profiles were made in the app used in data collection and from these participants a final set of 599 were drawn which fulfilled the stratified sampling criteria. The story retells ranged from 35.66 s to 251.4 s in length and had word counts ranging from 37 to 496, with a mean of 148.29 words. ICC between the RA and application micro-structure analyses ranged from 0.213 to 1.0 with 41 out of a total of 44 comparisons reaching 'good' (0.70-0.90) or 'excellent' (>0.90) levels of reliability. ICC between the RA and application macro-structure features were completed for 85 samples not used in training the algorithm. ICC ranged from 0.5577 to 0.939 with 5 out of 7 metrics being 'good' or better.

CONCLUSION

Work to date has demonstrated the potential of semi-automated transcription and linguistic analyses to provide reliable, detailed and informative narrative language analysis for young children and for the use of citizen science based approaches using mobile technologies to collect representative and informative research data. Clinical evaluation of this new app is ongoing, so we do not yet have data documenting its developmental or clinical sensitivity and specificity.

摘要

背景

为了利用技术给言语和语言治疗中的语言评估过程带来潜在益处，必须收集和分析大量自然语言数据样本。这些样本有助于开发和测试与预期临床应用相关的数据的新型软件应用程序。然而，收集和分析此类数据可能成本高昂且耗时。本文描述了一种新型应用程序的开发，该应用程序旨在引出并分析幼儿的故事复述叙述，以提供有关儿童语法结构（微观结构）和故事语法（宏观结构元素）使用情况的指标。开发的关键方面包括：（1）收集故事复述、确保话语准确转录和分段的方法；（2）测试该应用程序分析儿童故事复述中微观结构元素的可靠性；（3）开发一种分析叙述宏观结构元素的算法。

方法

采用协同设计流程来设计一款应用程序，该应用程序将用于通过移动技术收集儿童的故事复述样本。采用公民科学方法，利用主流营销在线渠道、媒体和广告牌广告来鼓励全英国的儿童参与。使用分层抽样框架，通过部分邮政编码和相关贫困指数，确保在年龄、性别和五个社会经济劣势等级方面获得具有代表性的样本。经过培训的研究助理（RA）完成了语言样本的转录以及微观和宏观结构分析。开发了改进自动语音识别产生的转录的方法，以实现可靠的分析。将RA的微观结构分析与数字应用程序生成的分析进行比较，使用组内相关系数（ICC）来测试其可靠性。RA的宏观结构分析用于训练一种算法以生成宏观结构指标。最后，将宏观结构算法的结果与未用于训练的一部分RA宏观结构分析进行比较，使用ICC来测试其可靠性。

结果

在数据收集所用的应用程序中总共创建了4517个档案，从这些参与者中最终抽取了599个符合分层抽样标准的样本。故事复述的时长从35.66秒到至251.4秒不等，单词数从37个到496个不等，平均为148.29个单词。RA与应用程序微观结构分析之间的ICC范围为0.213至1.0，在总共44次比较中，有41次达到了“良好”（0.70 - 0.90）或“优秀”（>0.90）的可靠性水平。针对85个未用于算法训练的样本完成了RA与应用程序宏观结构特征之间的ICC。ICC范围为0.5577至0.939，7个指标中有5个为“良好”或更好。

结论

迄今为止的工作证明了半自动转录和语言分析在为幼儿提供可靠、详细和信息丰富的叙述性语言分析方面的潜力，以及利用基于公民科学的方法通过移动技术收集具有代表性和信息丰富的研究数据的潜力。这款新应用程序的临床评估正在进行中，因此我们尚未获得记录其发育或临床敏感性和特异性的数据。

相似文献

The development of a digital story-retell elicitation and analysis tool through citizen science data collection, software development and machine learning.通过公民科学数据收集、软件开发和机器学习来开发一种数字故事复述启发与分析工具。

Front Psychol. 2023 Apr 20;14:989499. doi: 10.3389/fpsyg.2023.989499. eCollection 2023.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

The Efficacy of for Improving Oral Language in Third-Grade Spanish-English Bilingual Students With Developmental Language Disorder.改善三年级西班牙-英语双语发展性语言障碍学生口语能力的效果。

Lang Speech Hear Serv Sch. 2024 Jul;55(3):938-958. doi: 10.1044/2024_LSHSS-23-00121. Epub 2024 Jun 20.

Young children's narrative retell in response to static and animated stories.幼儿对静态故事和动画故事的叙事复述。

Int J Lang Commun Disord. 2020 May;55(3):359-372. doi: 10.1111/1460-6984.12523. Epub 2020 Jan 11.

Tell or Retell? The Role of Task and Language in Spanish-English Narrative Microstructure Performance.讲述还是复述？任务和语言在西班牙语-英语叙事微观结构表现中的作用。

Lang Speech Hear Serv Sch. 2022 Apr 11;53(2):511-531. doi: 10.1044/2021_LSHSS-21-00055. Epub 2022 Feb 22.

Story Generation and Narrative Retells in Children Who Are Hard of Hearing and Hearing Children.听障儿童和听力正常儿童的故事生成和叙事复述。

J Speech Lang Hear Res. 2023 Sep 13;66(9):3550-3573. doi: 10.1044/2023_JSLHR-23-00084. Epub 2023 Aug 17.

Narrative Elicitation as Ethnography: Methodological Insights From the Examination of Children's Perspective Marking in Amdo Tibetan.作为民族志的叙事引出：对安多藏语儿童视角标记考察的方法论启示

Front Psychol. 2021 Jun 3;12:644331. doi: 10.3389/fpsyg.2021.644331. eCollection 2021.

Assessment of inference-making in children using comprehension questions and story retelling: Effect of text modality and a story presentation format.使用理解性问题和故事复述对儿童推理能力的评估：文本模态和故事呈现形式的影响。

Int J Lang Commun Disord. 2021 May;56(3):637-652. doi: 10.1111/1460-6984.12620.

Comparing Spoken Versus iPad-Administered Versions of a Narrative Retell Assessment Tool in a Practice-Based Research Partnership.在实践研究合作中比较口语版与 iPad 版叙事复述评估工具。

Lang Speech Hear Serv Sch. 2024 Jul;55(3):976-984. doi: 10.1044/2024_LSHSS-23-00022. Epub 2024 Feb 23.

The Evolution of an Innovative Online Task to Monitor Children's Oral Narrative Development.一种用于监测儿童口头叙事发展的创新性在线任务的演变

Front Psychol. 2022 Jul 27;13:903124. doi: 10.3389/fpsyg.2022.903124. eCollection 2022.

本文引用的文献

The Evolution of an Innovative Online Task to Monitor Children's Oral Narrative Development.一种用于监测儿童口头叙事发展的创新性在线任务的演变

Front Psychol. 2022 Jul 27;13:903124. doi: 10.3389/fpsyg.2022.903124. eCollection 2022.

The Reliability of Short Conversational Language Sample Measures in Children With and Without Developmental Language Disorder.有和没有发育性语言障碍的儿童的简短会话语言样本测量的可靠性。

J Speech Lang Hear Res. 2022 May 11;65(5):1939-1955. doi: 10.1044/2022_JSLHR-21-00628. Epub 2022 Apr 8.

Dynamic Norming for Systematic Analysis of Language Transcripts.用于语言转录本系统分析的动态归一化

J Speech Lang Hear Res. 2022 Jan 12;65(1):320-333. doi: 10.1044/2021_JSLHR-21-00227. Epub 2021 Dec 10.

Language Sample Analysis in Clinical Practice: Speech-Language Pathologists' Barriers, Facilitators, and Needs.临床实践中的语言样本分析：言语语言病理学家的障碍、促进因素和需求。

Lang Speech Hear Serv Sch. 2022 Jan 5;53(1):1-16. doi: 10.1044/2021_LSHSS-21-00026. Epub 2021 Oct 25.

Smartphones and the Neuroscience of Mental Health.智能手机与精神健康的神经科学

Annu Rev Neurosci. 2021 Jul 8;44:129-151. doi: 10.1146/annurev-neuro-101220-014053. Epub 2021 Feb 8.

Citizen Science Models in Health Research: an Australian Commentary.健康研究中的公民科学模式：一篇澳大利亚评论

Online J Public Health Inform. 2019 Dec 31;11(3):e23. doi: 10.5210/ojphi.v11i3.10358. eCollection 2019.

Young children's narrative retell in response to static and animated stories.幼儿对静态故事和动画故事的叙事复述。

Int J Lang Commun Disord. 2020 May;55(3):359-372. doi: 10.1111/1460-6984.12523. Epub 2020 Jan 11.

Using Computer Programs for Language Sample Analysis.使用计算机程序进行语言样本分析。

Lang Speech Hear Serv Sch. 2020 Jan 8;51(1):103-114. doi: 10.1044/2019_LSHSS-18-0148. Epub 2019 Nov 7.

An exploration of automated narrative analysis via machine learning.通过机器学习进行自动化叙述分析的探索。

PLoS One. 2019 Oct 31;14(10):e0224634. doi: 10.1371/journal.pone.0224634. eCollection 2019.

Phase 2 of CATALISE: a multinational and multidisciplinary Delphi consensus study of problems with language development: Terminology.CATALISE 研究第二阶段：多国多学科德尔菲共识研究语言发育问题：术语。

J Child Psychol Psychiatry. 2017 Oct;58(10):1068-1080. doi: 10.1111/jcpp.12721. Epub 2017 Mar 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过公民科学数据收集、软件开发和机器学习来开发一种数字故事复述启发与分析工具。

The development of a digital story-retell elicitation and analysis tool through citizen science data collection, software development and machine learning.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献