Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA.
Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, USA.
J Healthc Eng. 2017;2017:2460174. doi: 10.1155/2017/2460174. Epub 2017 Aug 3.
Online healthcare forums (OHFs) have become increasingly popular for patients to share their health-related experiences. The healthcare-related texts posted in OHFs could help doctors and patients better understand specific diseases and the situations of other patients. To extract the meaning of a post, a commonly used way is to classify the sentences into several predefined categories of different semantics. However, the unstructured form of online posts brings challenges to existing classification algorithms. In addition, though many sophisticated classification models such as deep neural networks may have good predictive power, it is hard to interpret the models and the prediction results, which is, however, critical in healthcare applications. To tackle the challenges above, we propose an effective and interpretable OHF post classification framework. Specifically, we classify sentences into three classes: medication, symptom, and background. Each sentence is projected into an interpretable feature space consisting of labeled sequential patterns, UMLS semantic types, and other heuristic features. A forest-based model is developed for categorizing OHF posts. An interpretation method is also developed, where the decision rules can be explicitly extracted to gain an insight of useful information in texts. Experimental results on real-world OHF data demonstrate the effectiveness of our proposed computational framework.
在线医疗论坛 (OHF) 已成为患者分享健康相关经验的热门方式。OHF 上发布的与医疗相关的文本可以帮助医生和患者更好地了解特定疾病和其他患者的情况。为了提取帖子的含义,一种常用的方法是将句子分类为几个预定义的语义类别。然而,在线帖子的非结构化形式给现有的分类算法带来了挑战。此外,尽管像深度神经网络这样的复杂分类模型可能具有良好的预测能力,但很难解释模型和预测结果,而这在医疗保健应用中至关重要。为了应对上述挑战,我们提出了一个有效且可解释的 OHF 帖子分类框架。具体来说,我们将句子分为三类:药物、症状和背景。每个句子都被投影到一个可解释的特征空间中,该空间由标记的序列模式、UMLS 语义类型和其他启发式特征组成。我们开发了一种基于森林的模型来对 OHF 帖子进行分类。我们还开发了一种解释方法,可以显式提取决策规则,以深入了解文本中的有用信息。在真实的 OHF 数据上的实验结果表明了我们提出的计算框架的有效性。