Suppr超能文献

利用无监督学习方法从双向短信平台识别高血压管理相关药物意向:回顾性观察性试点研究。

Identifying Medication-Related Intents From a Bidirectional Text Messaging Platform for Hypertension Management Using an Unsupervised Learning Approach: Retrospective Observational Pilot Study.

机构信息

Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States.

Division of General Internal Medicine, Department of Medicine, The Ohio State University Wexner Medical Center, Columbus, OH, United States.

出版信息

J Med Internet Res. 2022 Jun 29;24(6):e36151. doi: 10.2196/36151.

Abstract

BACKGROUND

Free-text communication between patients and providers plays an increasing role in chronic disease management, through platforms varying from traditional health care portals to novel mobile messaging apps. These text data are rich resources for clinical purposes, but their sheer volume render them difficult to manage. Even automated approaches, such as natural language processing, require labor-intensive manual classification for developing training data sets. Automated approaches to organizing free-text data are necessary to facilitate use of free-text communication for clinical care.

OBJECTIVE

The aim of this study was to apply unsupervised learning approaches to (1) understand the types of topics discussed and (2) learn medication-related intents from messages sent between patients and providers through a bidirectional text messaging system for managing participant blood pressure (BP).

METHODS

This study was a secondary analysis of deidentified messages from a remote, mobile, text-based employee hypertension management program at an academic institution. We trained a latent Dirichlet allocation (LDA) model for each message type (ie, inbound patient messages and outbound provider messages) and identified the distribution of major topics and significant topics (probability >.20) across message types. Next, we annotated all medication-related messages with a single medication intent. Then, we trained a second medication-specific LDA (medLDA) model to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n=1-3 words) using spaCy, clinical named entities using Stanza, and medication categories using MedEx; we then applied chi-square feature selection to learn the most informative features associated with each medication intent.

RESULTS

In total, 253 participants and 5 providers engaged in the program, generating 12,131 total messages: 46.90% (n=5689) patient messages and 53.10% (n=6442) provider messages. Most patient messages corresponded to BP reporting, BP encouragement, and appointment scheduling; most provider messages corresponded to BP reporting, medication adherence, and confirmatory statements. Most patient and provider messages contained 1 topic and few contained more than 3 topics identified using LDA. In total, 534 medication messages were annotated with a single medication intent. Of these, 282 (52.8%) were patient medication messages: most referred to the medication request intent (n=134, 47.5%). Most of the 252 (47.2%) provider medication messages referred to the medication question intent (n=173, 68.7%). Although the medLDA model could identify a majority intent within each topic, it could not distinguish medication intents with low prevalence within patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class.

CONCLUSIONS

LDA can be an effective method for generating subgroups of messages with similar term usage and facilitating the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated, deep, medication intent classification.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2021.12.23.21268061.

摘要

背景

患者与提供者之间的自由文本交流在慢性病管理中发挥着越来越重要的作用,交流平台从传统的医疗保健门户到新型的移动消息应用程序各不相同。这些文本数据是临床应用的丰富资源,但数量庞大,难以管理。即使是自动化方法,如自然语言处理,也需要大量的人工分类来开发训练数据集。因此,需要自动化的方法来组织自由文本数据,以促进自由文本交流在临床护理中的应用。

目的

本研究旨在应用无监督学习方法:(1)了解讨论的主题类型;(2)从通过双向文本消息系统发送的患者和提供者之间的消息中学习与药物相关的意图,该系统用于管理参与者的血压(BP)。

方法

这是在学术机构远程、移动、基于文本的员工高血压管理计划中对经过身份验证的消息进行的二次分析。我们为每个消息类型(即入站患者消息和出站提供者消息)训练了一个潜在狄利克雷分配(LDA)模型,并确定了主要主题和重要主题(概率>.20)在消息类型中的分布。接下来,我们使用单个药物意图标记所有与药物相关的消息。然后,我们训练了第二个特定于药物的 LDA(medLDA)模型,以评估无监督方法识别更精细的药物意图的能力。我们使用 spaCy 对每个药物消息进行 n-gram(n=1-3 个单词)编码,使用 Stanza 对临床命名实体进行编码,使用 MedEx 对药物类别进行编码;然后,我们应用卡方特征选择来学习与每个药物意图相关的最具信息量的特征。

结果

共有 253 名参与者和 5 名提供者参与了该计划,共产生了 12131 条消息:46.90%(n=5689)为患者消息,53.10%(n=6442)为提供者消息。大多数患者消息与 BP 报告、BP 鼓励和预约安排有关;大多数提供者消息与 BP 报告、药物依从性和确认性陈述有关。大多数患者和提供者的消息只包含 1 个主题,很少有消息包含 LDA 识别的 3 个以上主题。总共对 534 条药物消息进行了单个药物意图的注释。其中,282 条(52.8%)为患者药物消息:大多数涉及药物请求意图(n=134,47.5%)。252 条(47.2%)提供者药物消息中,大多数涉及药物询问意图(n=173,68.7%)。尽管 medLDA 模型可以识别每个主题内的主要意图,但它无法区分患者或提供者消息中低流行度的药物意图。更丰富的特征工程可以识别与每个药物意图类相关的信息性词汇语义模式。

结论

LDA 可以作为一种有效的方法来生成具有相似术语用法的消息子组,并有助于审查主题以告知注释。然而,意图之间的训练案例和共享词汇很少,这限制了 LDA 用于完全自动化、深入的药物意图分类的使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6364/9280462/fbe301562265/jmir_v24i6e36151_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验