Doing-Harris Kristina, Mowery Danielle L, Daniels Chrissy, Chapman Wendy W, Conway Mike
Westminster College, Salt Lake City, UT.
Department of Biomedical Informatics, University of Utah, Salt Lake City, UT.
AMIA Annu Symp Proc. 2017 Feb 10;2016:524-533. eCollection 2016.
Important information is encoded in free-text patient comments. We determine the most common topics in patient comments, design automatic topic classifiers, identify comments ' sentiment, and find new topics in negative comments. Our annotation scheme consisted of 28 topics, with positive and negative sentiment. Within those 28 topics, the seven most frequent accounted for 63% of annotations. For automated topic classification, we developed vocabulary-based and Naive Bayes ' classifiers. For sentiment analysis, another Naive Bayes ' classifier was used. Finally, we used topic modeling to search for unexpected topics within negative comments. The seven most common topics were appointment access, appointment wait, empathy, explanation, friendliness, practice environment, and overall experience. The best F-measures from our classifier were 0.52(NB), 0.57(NB), 0.36(Vocab), 0.74(NB), 0.40(NB), and 0.44(Vocab), respectively. F- scores ranged from 0.16 to 0.74. The sentiment classification F-score was 0.84. Negative comment topic modeling revealed complaints about appointment access, appointment wait, and time spent with physician.
重要信息编码在患者的自由文本评论中。我们确定患者评论中最常见的主题,设计自动主题分类器,识别评论的情感,并在负面评论中发现新主题。我们的注释方案包括28个主题,带有积极和消极情感。在这28个主题中,最频繁出现的7个主题占注释的63%。对于自动主题分类,我们开发了基于词汇的分类器和朴素贝叶斯分类器。对于情感分析,使用了另一个朴素贝叶斯分类器。最后,我们使用主题建模在负面评论中搜索意外主题。最常见的7个主题是预约便利性、预约等待时间、同理心、解释、友好程度、就医环境和总体体验。我们分类器的最佳F值分别为0.52(朴素贝叶斯)、0.57(朴素贝叶斯)、0.36(词汇)、0.74(朴素贝叶斯)、0.40(朴素贝叶斯)和0.44(词汇)。F分数范围为0.16至0.74。情感分类F值为0.84。负面评论主题建模揭示了对预约便利性、预约等待时间以及与医生相处时间的抱怨。