Zhang Shaodian, Grave Edouard, Sklar Elizabeth, Elhadad Noémie
Department of Biomedical Informatics, Columbia University, New York, NY, USA.
King's College London, London, UK.
J Biomed Inform. 2017 May;69:1-9. doi: 10.1016/j.jbi.2017.03.012. Epub 2017 Mar 18.
Identifying topics of discussions in online health communities (OHC) is critical to various information extraction applications, but can be difficult because topics of OHC content are usually heterogeneous and domain-dependent. In this paper, we provide a multi-class schema, an annotated dataset, and supervised classifiers based on convolutional neural network (CNN) and other models for the task of classifying discussion topics. We apply the CNN classifier to the most popular breast cancer online community, and carry out cross-sectional and longitudinal analyses to show topic distributions and topic dynamics throughout members' participation. Our experimental results suggest that CNN outperforms other classifiers in the task of topic classification and identify several patterns and trajectories. For example, although members discuss mainly disease-related topics, their interest may change through time and vary with their disease severities.
识别在线健康社区(OHC)中的讨论主题对于各种信息提取应用至关重要,但可能会很困难,因为OHC内容的主题通常是异质的且依赖于领域。在本文中,我们提供了一个多类模式、一个带注释的数据集以及基于卷积神经网络(CNN)和其他模型的监督分类器,用于讨论主题分类任务。我们将CNN分类器应用于最受欢迎的乳腺癌在线社区,并进行横断面和纵向分析,以展示在成员参与过程中的主题分布和主题动态。我们的实验结果表明,在主题分类任务中,CNN优于其他分类器,并识别出了几种模式和轨迹。例如,尽管成员主要讨论与疾病相关的主题,但他们的兴趣可能会随时间变化,并且因疾病严重程度而异。