School of Nursing, Columbia University, New York, New York, USA.
Center for Home Care Policy & Research, VNS Health, New York, New York, USA.
J Am Med Inform Assoc. 2023 Sep 25;30(10):1673-1683. doi: 10.1093/jamia/ocad139.
Patient-clinician communication provides valuable explicit and implicit information that may indicate adverse medical conditions and outcomes. However, practical and analytical approaches for audio-recording and analyzing this data stream remain underexplored. This study aimed to 1) analyze patients' and nurses' speech in audio-recorded verbal communication, and 2) develop machine learning (ML) classifiers to effectively differentiate between patient and nurse language.
Pilot studies were conducted at VNS Health, the largest not-for-profit home healthcare agency in the United States, to optimize audio-recording patient-nurse interactions. We recorded and transcribed 46 interactions, resulting in 3494 "utterances" that were annotated to identify the speaker. We employed natural language processing techniques to generate linguistic features and built various ML classifiers to distinguish between patient and nurse language at both individual and encounter levels.
A support vector machine classifier trained on selected linguistic features from term frequency-inverse document frequency, Linguistic Inquiry and Word Count, Word2Vec, and Medical Concepts in the Unified Medical Language System achieved the highest performance with an AUC-ROC = 99.01 ± 1.97 and an F1-score = 96.82 ± 4.1. The analysis revealed patients' tendency to use informal language and keywords related to "religion," "home," and "money," while nurses utilized more complex sentences focusing on health-related matters and medical issues and were more likely to ask questions.
The methods and analytical approach we developed to differentiate patient and nurse language is an important precursor for downstream tasks that aim to analyze patient speech to identify patients at risk of disease and negative health outcomes.
医患沟通提供了有价值的显性和隐性信息,这些信息可能表明存在不良的医疗状况和结果。然而,用于音频记录和分析该数据流的实用和分析方法仍未得到充分探索。本研究旨在:1)分析音频记录的口头交流中患者和护士的言语;2)开发机器学习(ML)分类器,以有效区分患者和护士的语言。
在 VNS Health(美国最大的非营利性家庭医疗保健机构)进行了试点研究,以优化音频记录患者-护士互动。我们记录并转录了 46 次互动,产生了 3494 个“话语”,这些话语被标注以识别说话者。我们采用自然语言处理技术生成语言特征,并构建了各种 ML 分类器,以在个体和交互层面上区分患者和护士的语言。
在基于词频-逆文档频率、语言探究和词汇计数、Word2Vec 和统一医学语言系统中的医学概念选择的语言特征上训练的支持向量机分类器表现最佳,AUC-ROC=99.01±1.97,F1 分数=96.82±4.1。分析显示患者倾向于使用非正式语言和与“宗教”、“家庭”和“金钱”相关的关键词,而护士则使用更复杂的句子,侧重于与健康相关的问题和医疗问题,并且更有可能提问。
我们开发的区分患者和护士语言的方法和分析方法是分析患者言语以识别有疾病和不良健康结果风险的患者的下游任务的重要前提。