Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan.
Lifescience AI Business Division, Research Development Department, FRONTEO Inc, Tokyo, Japan.
Sci Rep. 2022 Aug 3;12(1):12461. doi: 10.1038/s41598-022-16204-4.
In recent years, studies on the use of natural language processing (NLP) approaches to identify dementia have been reported. Most of these studies used picture description tasks or other similar tasks to encourage spontaneous speech, but the use of free conversation without requiring a task might be easier to perform in a clinical setting. Moreover, free conversation is unlikely to induce a learning effect. Therefore, the purpose of this study was to develop a machine learning model to discriminate subjects with and without dementia by extracting features from unstructured free conversation data using NLP. We recruited patients who visited a specialized outpatient clinic for dementia and healthy volunteers. Participants' conversation was transcribed and the text data was decomposed from natural sentences into morphemes by performing a morphological analysis using NLP, and then converted into real-valued vectors that were used as features for machine learning. A total of 432 datasets were used, and the resulting machine learning model classified the data for dementia and non-dementia subjects with an accuracy of 0.900, sensitivity of 0.881, and a specificity of 0.916. Using sentence vector information, it was possible to develop a machine-learning algorithm capable of discriminating dementia from non-dementia subjects with a high accuracy based on free conversation.
近年来,已有研究报告使用自然语言处理(NLP)方法来识别痴呆症。这些研究大多使用图片描述任务或其他类似任务来鼓励自发语言,但在临床环境中,不要求任务的自由对话可能更容易进行。此外,自由对话不太可能产生学习效应。因此,本研究的目的是开发一种机器学习模型,通过使用 NLP 从非结构化的自由对话数据中提取特征来区分有和无痴呆症的受试者。我们招募了就诊于专门的痴呆门诊的患者和健康志愿者。参与者的对话被转录下来,然后使用 NLP 进行形态分析,将文本数据从自然语句中分解成语素,并将其转换为用于机器学习的实值向量作为特征。总共使用了 432 个数据集,由此产生的机器学习模型对痴呆症和非痴呆症受试者的数据进行分类,准确率为 0.900,灵敏度为 0.881,特异性为 0.916。通过句子向量信息,有可能开发出一种基于自由对话的机器算法,能够以高精度区分痴呆症和非痴呆症患者。