Suppr超能文献

迈向自动化HIV识别:用于快速识别HIV相关社交媒体数据的机器学习

Toward Automating HIV Identification: Machine Learning for Rapid Identification of HIV-Related Social Media Data.

作者信息

Young Sean D, Yu Wenchao, Wang Wei

机构信息

*Department of Family Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA; †University of California Institute for Prediction Technology, University of California, Los Angeles, CA; and ‡Department of Computer Science, University of California, Los Angeles, CA.

出版信息

J Acquir Immune Defic Syndr. 2017 Feb 1;74 Suppl 2(Suppl 2):S128-S131. doi: 10.1097/QAI.0000000000001240.

Abstract

INTRODUCTION

"Social big data" from technologies such as social media, wearable devices, and online searches continue to grow and can be used as tools for HIV research. Although researchers can uncover patterns and insights associated with HIV trends and transmission, the review process is time consuming and resource intensive. Machine learning methods derived from computer science might be used to assist HIV domain experts by learning how to rapidly and accurately identify patterns associated with HIV from a large set of social data.

METHODS

Using an existing social media data set that was associated with HIV and coded by an HIV domain expert, we tested whether 4 commonly used machine learning methods could learn the patterns associated with HIV risk behavior. We used the 10-fold cross-validation method to examine the speed and accuracy of these models in applying that knowledge to detect HIV content in social media data.

RESULTS AND DISCUSSION

Logistic regression and random forest resulted in the highest accuracy in detecting HIV-related social data (85.3%), whereas the Ridge Regression Classifier resulted in the lowest accuracy. Logistic regression yielded the fastest processing time (16.98 seconds).

CONCLUSIONS

Machine learning can enable social big data to become a new and important tool in HIV research, helping to create a new field of "digital HIV epidemiology." If a domain expert can identify patterns in social data associated with HIV risk or HIV transmission, machine learning models could quickly and accurately learn those associations and identify potential HIV patterns in large social data sets.

摘要

引言

来自社交媒体、可穿戴设备和在线搜索等技术的“社会大数据”持续增长,可作为艾滋病研究的工具。尽管研究人员能够发现与艾滋病趋势和传播相关的模式及见解,但审查过程既耗时又耗费资源。源自计算机科学的机器学习方法或许可用于协助艾滋病领域专家,通过学习如何从大量社会数据中快速准确地识别与艾滋病相关的模式。

方法

利用一个与艾滋病相关且由一位艾滋病领域专家编码的现有社交媒体数据集,我们测试了4种常用的机器学习方法是否能够学习与艾滋病风险行为相关的模式。我们使用10折交叉验证法来检验这些模型将该知识应用于检测社交媒体数据中艾滋病相关内容的速度和准确性。

结果与讨论

逻辑回归和随机森林在检测与艾滋病相关的社会数据方面准确率最高(85.3%),而岭回归分类器的准确率最低。逻辑回归的处理时间最快(16.98秒)。

结论

机器学习可使社会大数据成为艾滋病研究中的一种新的重要工具,有助于创建“数字艾滋病流行病学”这一新领域。如果领域专家能够识别社会数据中与艾滋病风险或艾滋病传播相关的模式,机器学习模型就能快速准确地学习这些关联,并在大型社会数据集中识别潜在的艾滋病模式。

相似文献

引用本文的文献

9
Artificial intelligence and sexual health in the USA.美国的人工智能与性健康
Lancet Digit Health. 2021 Aug;3(8):e467-e468. doi: 10.1016/S2589-7500(21)00117-5.

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验