Saha Budhaditya, Lisboa Sanal, Ghosh Shameek
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5847-5850. doi: 10.1109/EMBC44109.2020.9175577.
In clinical conversational applications, extracted entities tend to capture the main subject of a patient's complaint, namely symptoms or diseases. However, they mostly fail to recognize the characterizations of a complaint such as the time, the onset, and the severity. For example, if the input is "I have a headache and it is extreme", state-of-the-art models only recognize the main symptom entity - headache, but ignore the severity factor of extreme, that characterises headache. In this paper, we design a two-fold approach to detect the characterizations of entities like symptoms presented by general users in contexts where they would describe their symptoms to a clinician. We use Word2Vec and BERT models to encode clinical text given by the patients. We transform the output and re-frame the task as a multi-label classification problem. Finally, we combine the processed encodings with the Linear Discriminant Analysis (LDA) algorithm to classify the characterizations of the main entity. Experimental results demonstrate that our method achieves 40-50% improvement in the accuracy over the state-of-the-art models.
在临床对话应用中,提取的实体往往能够捕捉患者主诉的主要主题,即症状或疾病。然而,它们大多无法识别主诉的特征,如时间、发作情况和严重程度。例如,如果输入是“我头痛,而且非常严重”,最先进的模型只会识别主要症状实体——头痛,而忽略了表征头痛的严重程度因素“非常严重”。在本文中,我们设计了一种双重方法,用于在普通用户向临床医生描述症状的情境中,检测诸如症状等实体的特征。我们使用Word2Vec和BERT模型对患者给出的临床文本进行编码。我们对输出进行转换,并将任务重新构建为多标签分类问题。最后,我们将处理后的编码与线性判别分析(LDA)算法相结合,以对主要实体的特征进行分类。实验结果表明,我们的方法在准确率上比最先进的模型提高了40%-50%。