• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于在线文本分析的隐匿人群定位推断。

Location inference for hidden population with online text analysis.

机构信息

College of Systems Engineering, National University of Defense Technology, Changsha, 410073, China.

School of Software Engineering, Shenzhen Institute of Information Technology, Shenzhen, 518172, China.

出版信息

Int J Health Geogr. 2020 Dec 9;19(1):57. doi: 10.1186/s12942-020-00245-x.

DOI:10.1186/s12942-020-00245-x
PMID:33298074
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7724834/
Abstract

BACKGROUND

Understanding the geographic distribution of hidden population, such as men who have sex with men (MSM), sex workers, or injecting drug users, are of great importance for the adequate deployment of intervention strategies and public health decision making. However, due to the hard-to-access properties, e.g., lack of a sampling frame, sensitivity issue, reporting error, etc., traditional survey methods are largely limited when studying such populations. With data extracted from the very active online community of MSM in China, in this study we adopt and develop location inferring methods to achieve a high-resolution mapping of users in this community at national level.

METHODS

We collect a comprehensive dataset from the largest sub-community related to MSM topics in Baidu Tieba, covering 628,360 MSM-related users. Based on users' publicly available posts, we evaluate and compare the performances of mainstream location inference algorithms on the online locating problem of Chinese MSM population. To improve the inference accuracy, other approaches in natural language processing are introduced into the location extraction, such as context analysis and pattern recognition. In addition, we develop a hybrid voting algorithm (HVA-LI) by allowing different approaches to vote to determine the best inference results, which guarantees a more effective way on location inference for hidden population.

RESULTS

By comparing the performances of popular inference algorithms, we find that the classic gazetteer-based algorithm has achieved better results. And in the HVA-LI algorithms, the hybrid algorithm consisting of the simple gazetteer-based method and named entity recognition (NER) is proven to be the best to deal with inferring users' locations disclosed in short texts on online communities, improving the inferring accuracy from 50.3 to 71.3% on the MSM-related dataset.

CONCLUSIONS

In this study, we have explored the possibility of location inferring by analyzing textual content posted by online users. A more effective hybrid algorithm, i.e., the Gazetteer & NER algorithm is proposed, which is conducive to overcoming the sparse location labeling problem in user profiles, and can be extended to the inference of geo-statistics for other hidden populations.

摘要

背景

了解男男性行为者(MSM)、性工作者或注射吸毒者等隐蔽人群的地理分布情况非常重要,这对于充分部署干预策略和制定公共卫生决策具有重要意义。然而,由于难以接触到这些人群,例如缺乏抽样框架、敏感性问题、报告错误等,传统的调查方法在研究这些人群时受到了很大的限制。本研究通过从中国最大的 MSM 在线社区中提取数据,采用并开发位置推断方法,实现了全国范围内该社区用户的高分辨率映射。

方法

我们从百度贴吧中与 MSM 主题相关的最大子社区中收集了一个全面的数据集,其中包含 628360 名 MSM 相关用户。基于用户公开的帖子,我们评估和比较了主流位置推断算法在解决中国 MSM 人群在线定位问题上的性能。为了提高推断的准确性,我们将自然语言处理中的其他方法引入到位置提取中,例如上下文分析和模式识别。此外,我们通过允许不同的方法投票来确定最佳推断结果,开发了一种混合投票算法(HVA-LI),从而保证了一种更有效的隐藏人群位置推断方法。

结果

通过比较流行的推断算法的性能,我们发现基于地名典的经典算法取得了更好的结果。在 HVA-LI 算法中,由简单基于地名典的方法和命名实体识别(NER)组成的混合算法被证明是处理在线社区中短文本用户位置推断的最佳方法,将推断准确率从 MSM 相关数据集的 50.3%提高到 71.3%。

结论

本研究通过分析在线用户发布的文本内容,探索了位置推断的可能性。提出了一种更有效的混合算法,即地名典和 NER 算法,有利于克服用户资料中位置标记稀疏的问题,并可扩展到其他隐蔽人群的地理统计推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/c8e173fcc880/12942_2020_245_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/17f143e8cc7f/12942_2020_245_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/3d5b5f7656ed/12942_2020_245_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/a1b89c37b5fc/12942_2020_245_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/c8e173fcc880/12942_2020_245_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/17f143e8cc7f/12942_2020_245_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/3d5b5f7656ed/12942_2020_245_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/a1b89c37b5fc/12942_2020_245_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da37/7724834/c8e173fcc880/12942_2020_245_Fig4_HTML.jpg

相似文献

1
Location inference for hidden population with online text analysis.基于在线文本分析的隐匿人群定位推断。
Int J Health Geogr. 2020 Dec 9;19(1):57. doi: 10.1186/s12942-020-00245-x.
2
Analyzing hidden populations online: topic, emotion, and social network of HIV-related users in the largest Chinese online community.分析在线隐藏人群:最大的中文在线社区中与 HIV 相关用户的主题、情绪和社交网络。
BMC Med Inform Decis Mak. 2018 Jan 5;18(1):2. doi: 10.1186/s12911-017-0579-1.
3
A Men Who Have Sex With Men-Friendly Doctor Finder Hackathon in Guangzhou, China: Development of a Mobile Health Intervention to Enhance Health Care Utilization.中国广州男男性接触者友好医生查询应用程序黑客马拉松:开发移动健康干预措施以增强医疗保健利用
JMIR Mhealth Uhealth. 2020 Feb 27;8(2):e16030. doi: 10.2196/16030.
4
A Cross-sectional Survey of HIV Transmission and Behavior among Men Who Have Sex with Men in Different Areas of Inner Mongolia Autonomous Region, China.中国内蒙古自治区不同地区男男性行为者中艾滋病病毒传播与行为的横断面调查
BMC Public Health. 2016 Nov 15;16(1):1161. doi: 10.1186/s12889-016-3809-z.
5
Contextualizing condoms: a cross-sectional study mapping intersections of locations of sexual contact, partner type, and substance use as contexts for sexual risk behavior among MSM in Peru.语境化避孕套:一项横断面研究,绘制了秘鲁男男性行为者中与性接触地点、伴侣类型和物质使用相关的性风险行为的语境。
BMC Infect Dis. 2019 Nov 11;19(1):958. doi: 10.1186/s12879-019-4517-y.
6
Protocol for a multicenter, real-world study of HIV pre-exposure prophylaxis among men who have sex with men in China (CROPrEP).中国男男性行为人群中 HIV 暴露前预防多中心真实世界研究方案(CROPrEP)
BMC Infect Dis. 2019 Aug 15;19(1):721. doi: 10.1186/s12879-019-4355-y.
7
[Status quo and characteristic analysis among MSM-users of the "Internet Plus-based AIDS Comprehensive Prevention Service System" in Guangzhou].[广州市“互联网+艾滋病综合防治服务系统”男男性行为人群使用现状及特征分析]
Zhonghua Liu Xing Bing Xue Za Zhi. 2019 Oct 10;40(10):1206-1211. doi: 10.3760/cma.j.issn.0254-6450.2019.10.007.
8
Identifying high risk subgroups of MSM: a latent class analysis using two samples.识别男男性行为者中的高危亚群:使用两个样本的潜在类别分析。
BMC Infect Dis. 2019 Mar 5;19(1):213. doi: 10.1186/s12879-019-3700-5.
9
A comparison between respondent-driven sampling and time-location sampling among men who have sex with men in Shenzhen, China.中国深圳男男性行为者中应答驱动抽样与时间-地点抽样的比较。
Arch Sex Behav. 2015 Oct;44(7):2055-65. doi: 10.1007/s10508-014-0350-y. Epub 2014 Sep 20.
10
Identifying MSM-competent physicians in China: a national online cross-sectional survey among physicians who see male HIV/STI patients.在中国识别具备为男男性行为者提供服务能力的医生:一项针对诊治男性艾滋病毒/性传播感染患者的医生开展的全国性在线横断面调查。
BMC Health Serv Res. 2018 Dec 13;18(1):964. doi: 10.1186/s12913-018-3781-7.

本文引用的文献

1
Mapping the Spatial-Temporal Distribution and Migration Patterns of Men Who Have Sex with Men in Mainland China: A Web-Based Study.中国大陆男男性行为者时空分布与迁移模式研究:一项基于网络的研究。
Int J Environ Res Public Health. 2020 Feb 25;17(5):1469. doi: 10.3390/ijerph17051469.
2
Spatiotemporal Analysis of Men Who Have Sex With Men in Mainland China: Social App Capture-Recapture Method.中国大陆男男性行为者的时空分析:社交应用捕获-再捕获法。
JMIR Mhealth Uhealth. 2020 Jan 24;8(1):e14800. doi: 10.2196/14800.
3
Relationship Status and Marital Intention Among Chinese Gay Men and Lesbians: The Influences of Minority Stress and Culture-Specific Stress.
中国男同性恋者和女同性恋者的恋爱关系状况和婚姻意愿:少数群体压力和特定文化压力的影响。
Arch Sex Behav. 2020 Feb;49(2):681-692. doi: 10.1007/s10508-019-01528-6. Epub 2019 Dec 3.
4
Network Evolution of a Large Online MSM Dating Community: 2005-2018.大型在线男男性行为者交友社区的网络演化:2005-2018 年。
Int J Environ Res Public Health. 2019 Nov 6;16(22):4322. doi: 10.3390/ijerph16224322.
5
Inferring Opinions and Behavioral Characteristics of Gay Men with Large Scale Multilingual Text from Blued.从 Blued 上的大规模多语言文本推断男同性恋者的意见和行为特征。
Int J Environ Res Public Health. 2019 Sep 26;16(19):3597. doi: 10.3390/ijerph16193597.
6
Structure of Online Dating Markets in U.S. Cities.美国城市在线约会市场的结构。
Sociol Sci. 2019;6:219-234. doi: 10.15195/v6.a9. Epub 2019 Apr 2.
7
What demographic attributes do our digital footprints reveal? A systematic review.我们的数字足迹揭示了哪些人口统计属性?系统评价。
PLoS One. 2018 Nov 28;13(11):e0207112. doi: 10.1371/journal.pone.0207112. eCollection 2018.
8
Maintaining "mianzi" and "lizi": Understanding the reasons for formality marriages between gay men and lesbians in China.维持“面子”与“里子”:解读中国男同性恋者与女同性恋者形式婚姻的原因
Transcult Psychiatry. 2019 Feb;56(1):213-232. doi: 10.1177/1363461518799517. Epub 2018 Sep 10.
9
Analyzing hidden populations online: topic, emotion, and social network of HIV-related users in the largest Chinese online community.分析在线隐藏人群:最大的中文在线社区中与 HIV 相关用户的主题、情绪和社交网络。
BMC Med Inform Decis Mak. 2018 Jan 5;18(1):2. doi: 10.1186/s12911-017-0579-1.
10
An Immunization Strategy for Hidden Populations.针对隐蔽人群的免疫策略。
Sci Rep. 2017 Jun 12;7(1):3268. doi: 10.1038/s41598-017-03379-4.