Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
Digital Reasoning Systems, Inc., Nashville, TN, USA.
J Biomed Inform. 2017 Oct;74:59-70. doi: 10.1016/j.jbi.2017.08.014. Epub 2017 Aug 30.
Patients communicate with healthcare providers via secure messaging in patient portals. As patient portal adoption increases, growing messaging volumes may overwhelm providers. Prior research has demonstrated promise in automating classification of patient portal messages into communication types to support message triage or answering. This paper examines if using semantic features and word context improves portal message classification.
Portal messages were classified into the following categories: informational, medical, social, and logistical. We constructed features from portal messages including bag of words, bag of phrases, graph representations, and word embeddings. We trained one-versus-all random forest and logistic regression classifiers, and convolutional neural network (CNN) with a softmax output. We evaluated each classifier's performance using Area Under the Curve (AUC).
Representing the messages using bag of words, the random forest detected informational, medical, social, and logistical communications in patient portal messages with AUCs: 0.803, 0.884, 0.828, and 0.928, respectively. Graph representations of messages outperformed simpler features with AUCs: 0.837, 0.914, 0.846, 0.884 for informational, medical, social, and logistical communication, respectively. Representing words with Word2Vec embeddings, and mapping features using a CNN had the best performance with AUCs: 0.908 for informational, 0.917 for medical, 0.935 for social, and 0.943 for logistical categories.
Word2Vec and graph representations improved the accuracy of classifying portal messages compared to features that lacked semantic information such as bag of words, and bag of phrases. Furthermore, using Word2Vec along with a CNN model, which provide a higher order representation, improved the classification of portal messages.
患者通过患者门户中的安全消息与医疗服务提供者进行沟通。随着患者门户的采用率不断提高,不断增长的消息量可能会让提供者应接不暇。先前的研究已经证明,自动化将患者门户消息分类为通信类型以支持消息分类或回答具有很大的潜力。本文研究了使用语义特征和单词上下文是否可以改善门户消息分类。
将门户消息分为以下几类:信息、医疗、社会和后勤。我们从门户消息中构建了特征,包括词袋、词袋、图形表示和单词嵌入。我们训练了一对一随机森林和逻辑回归分类器,以及具有 softmax 输出的卷积神经网络(CNN)。我们使用曲线下面积(AUC)评估每个分类器的性能。
使用词袋表示消息,随机森林检测到患者门户消息中的信息、医疗、社会和后勤通信,AUC 分别为:0.803、0.884、0.828 和 0.928。消息的图形表示优于简单特征,AUC 分别为:0.837、0.914、0.846 和 0.884 用于信息、医疗、社会和后勤通信。使用 Word2Vec 嵌入表示单词,并使用 CNN 映射特征,AUC 分别为 0.908 用于信息、0.917 用于医疗、0.935 用于社会和 0.943 用于后勤类别,性能最佳。
与缺乏语义信息的特征(如词袋和词袋)相比,Word2Vec 和图形表示提高了门户消息分类的准确性。此外,使用 Word2Vec 与 CNN 模型相结合,可以提供更高阶的表示,从而提高门户消息的分类效果。