Bright Rebecca, Ashton Elaine, Mckean Cristina, Wren Yvonne
Therapy Box, London, United Kingdom.
School of Education, Communication and Language Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
Front Psychol. 2023 Apr 20;14:989499. doi: 10.3389/fpsyg.2023.989499. eCollection 2023.
In order to leverage the potential benefits of technology to speech and language therapy language assessment processes, large samples of naturalistic language data must be collected and analysed. These samples enable the development and testing of novel software applications with data relevant to their intended clinical application. However, the collection and analysis of such data can be costly and time-consuming. This paper describes the development of a novel application designed to elicit and analyse young children's story retell narratives to provide metrics regarding the child's use of grammatical structures (micro-structure) and story grammar (macro-structure elements). Key aspects for development were (1) methods to collect story retells, ensure accurate transcription and segmentation of utterances; (2) testing the reliability of the application to analyse micro-structure elements in children's story retells and (3) development of an algorithm to analyse narrative macro-structure elements.
A co-design process was used to design an app which would be used to gather story retell samples from children using mobile technology. A citizen science approach using mainstream marketing online channels, the media and billboard ads was used to encourage participation from children across the United Kingdom. A stratified sampling framework was used to ensure a representative sample was obtained across age, gender and five bands of socio-economic disadvantage using partial postcodes and the relevant indices of deprivation. Trained Research Associates (RA) completed transcription and micro and macro-structure analysis of the language samples. Methods to improve transcriptions produced by automated speech recognition were developed to enable reliable analysis. RA micro-structure analyses were compared to those generated by the digital application to test its reliability using intra-class correlation (ICC). RA macro-structure analyses were used to train an algorithm to produce macro-structure metrics. Finally, results from the macro-structure algorithm were compared against a subset of RA macro-structure analyses not used in training to test its reliability using ICC.
A total of 4,517 profiles were made in the app used in data collection and from these participants a final set of 599 were drawn which fulfilled the stratified sampling criteria. The story retells ranged from 35.66 s to 251.4 s in length and had word counts ranging from 37 to 496, with a mean of 148.29 words. ICC between the RA and application micro-structure analyses ranged from 0.213 to 1.0 with 41 out of a total of 44 comparisons reaching 'good' (0.70-0.90) or 'excellent' (>0.90) levels of reliability. ICC between the RA and application macro-structure features were completed for 85 samples not used in training the algorithm. ICC ranged from 0.5577 to 0.939 with 5 out of 7 metrics being 'good' or better.
Work to date has demonstrated the potential of semi-automated transcription and linguistic analyses to provide reliable, detailed and informative narrative language analysis for young children and for the use of citizen science based approaches using mobile technologies to collect representative and informative research data. Clinical evaluation of this new app is ongoing, so we do not yet have data documenting its developmental or clinical sensitivity and specificity.
为了利用技术给言语和语言治疗中的语言评估过程带来潜在益处,必须收集和分析大量自然语言数据样本。这些样本有助于开发和测试与预期临床应用相关的数据的新型软件应用程序。然而,收集和分析此类数据可能成本高昂且耗时。本文描述了一种新型应用程序的开发,该应用程序旨在引出并分析幼儿的故事复述叙述,以提供有关儿童语法结构(微观结构)和故事语法(宏观结构元素)使用情况的指标。开发的关键方面包括:(1)收集故事复述、确保话语准确转录和分段的方法;(2)测试该应用程序分析儿童故事复述中微观结构元素的可靠性;(3)开发一种分析叙述宏观结构元素的算法。
采用协同设计流程来设计一款应用程序,该应用程序将用于通过移动技术收集儿童的故事复述样本。采用公民科学方法,利用主流营销在线渠道、媒体和广告牌广告来鼓励全英国的儿童参与。使用分层抽样框架,通过部分邮政编码和相关贫困指数,确保在年龄、性别和五个社会经济劣势等级方面获得具有代表性的样本。经过培训的研究助理(RA)完成了语言样本的转录以及微观和宏观结构分析。开发了改进自动语音识别产生的转录的方法,以实现可靠的分析。将RA的微观结构分析与数字应用程序生成的分析进行比较,使用组内相关系数(ICC)来测试其可靠性。RA的宏观结构分析用于训练一种算法以生成宏观结构指标。最后,将宏观结构算法的结果与未用于训练的一部分RA宏观结构分析进行比较,使用ICC来测试其可靠性。
在数据收集所用的应用程序中总共创建了4517个档案,从这些参与者中最终抽取了599个符合分层抽样标准的样本。故事复述的时长从35.66秒到至251.4秒不等,单词数从37个到496个不等,平均为148.29个单词。RA与应用程序微观结构分析之间的ICC范围为0.213至1.0,在总共44次比较中,有41次达到了“良好”(0.70 - 0.90)或“优秀”(>0.90)的可靠性水平。针对85个未用于算法训练的样本完成了RA与应用程序宏观结构特征之间的ICC。ICC范围为0.5577至0.939,7个指标中有5个为“良好”或更好。
迄今为止的工作证明了半自动转录和语言分析在为幼儿提供可靠、详细和信息丰富的叙述性语言分析方面的潜力,以及利用基于公民科学的方法通过移动技术收集具有代表性和信息丰富的研究数据的潜力。这款新应用程序的临床评估正在进行中,因此我们尚未获得记录其发育或临床敏感性和特异性的数据。