在线健康社区中帮助版主进行文本分类。

Text classification for assisting moderators in online health communities.

机构信息

Department of Telecommunication, Information Studies, and Media, Michigan State University, 404 Wilson Rd, Rm 409, East Lansing, MI 48864, USA.

出版信息

J Biomed Inform. 2013 Dec;46(6):998-1005. doi: 10.1016/j.jbi.2013.08.011. Epub 2013 Sep 8.

DOI:10.1016/j.jbi.2013.08.011

PMID:24025513

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3874858/

Abstract

OBJECTIVES

Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators' help.

METHODS

We employed a binary classifier on WebMD's online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ² statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard.

RESULTS

Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients' posts.

DISCUSSION

We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members' needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges.

CONCLUSION

Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators' expertise in these large-scale, social media environments.

摘要

目的

患者越来越多地访问在线健康社区以获取健康管理方面的帮助。由于这些在线社区规模庞大，版主无法参与所有对话；然而，有些对话需要他们的专业知识。我们的工作探索了低成本的文本分类方法，以确定在线健康论坛中的主题是否需要版主的帮助。

方法

我们在 WebMD 的在线糖尿病社区数据上使用了二元分类器。为了训练分类器，我们考虑了三种特征类型：（1）单词一元词，（2）情感分析特征，和（3）线程长度。我们应用了基于 χ² 统计量和欠采样的特征选择方法来处理不平衡数据。然后，我们进行了定性错误分析，以调查黄金标准的适当性。

结果

与在不平衡数据上使用单词一元词且没有特征选择方法的基线相比，使用情感分析特征、特征选择方法和平衡训练数据可将 AUC 值提高到 0.75，将 F1 分数提高到 0.54（0.65 AUC 和 0.40 F1 分数）。错误分析揭示了版主为何会回复患者帖子的其他原因。

讨论

我们展示了特征选择方法和平衡训练数据如何提高整体分类性能。我们提出了权衡精度和召回率以帮助在线健康社区的版主的影响。我们的错误分析揭示了在解决社区成员需求方面的社会、法律和道德问题。我们还注意到制作黄金标准的挑战，并讨论了解决这些挑战的潜在解决方案。

结论

社交媒体环境提供了患者获取健康相关信息的热门场所。我们的工作有助于理解在这些大规模的社交媒体环境中提供版主专业知识的可扩展解决方案。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在线健康社区中帮助版主进行文本分类。

Text classification for assisting moderators in online health communities.

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

在线健康社区中帮助版主进行文本分类。

Text classification for assisting moderators in online health communities.

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献