• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BERTopic_Teen:一种用于青少年健康领域短文本主题建模的多模块优化方法。

BERTopic_Teen: a multi-module optimization approach for short text topic modeling in adolescent health.

作者信息

Feng Yiqiang, Chen Ziao, Zhang Yuxin, Huang Wenyuan, Zhang Xuanming, He Siyu

机构信息

School of Marxism, Sichuan Agricultural University, Chengdu, China.

College of Law, Sichuan Agricultural University, Yaan, China.

出版信息

Front Public Health. 2025 Aug 12;13:1608241. doi: 10.3389/fpubh.2025.1608241. eCollection 2025.

DOI:10.3389/fpubh.2025.1608241
PMID:40873978
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12378273/
Abstract

Adolescent health has become a critical dimension in the digital era, as social media platforms emerge as vital sources of real-time behavioral data for informing sustainable and equitable public health strategies. However, conventional topic modeling methods often struggle with the semantic sparsity and noise inherent in short-form texts. The study proposes BERTopic_Teen, an enhanced topic modeling framework optimized for adolescent health-related tweets. The model incorporates three key innovations: a Popularity Deviation Regularizer (PDR) to suppress high-frequency generic terms and amplify domain-specific vocabulary; a Dynamic Document Embedding Optimizer (DDEO) that adaptively selects optimal UMAP dimensions based on silhouette scores; and a Probabilistic Reassignment Matrix (PRM) to reassign outlier documents to relevant topic clusters. Using a dataset of 64,441 tweets (61,039 successfully classified), experimental results show that BERTopic_Teen outperforms LDA, NMF, Top2Vec, and the original BERTopic in all key evaluation metrics. It achieves a 16.1% improvement in topic coherence (NPMI = 0.2184), higher topic diversity (TD = 0.9935), and lower perplexity (1.7214), indicating superior semantic clarity, topic distinctiveness, and modeling stability. These findings suggest that BERTopic_Teen offers a robust solution for extracting meaningful topics from social media data and advancing public health surveillance.

摘要

在数字时代,青少年健康已成为一个关键维度,因为社交媒体平台已成为实时行为数据的重要来源,可为可持续和公平的公共卫生战略提供信息。然而,传统的主题建模方法往往难以应对短文本中固有的语义稀疏性和噪声问题。该研究提出了BERTopic_Teen,这是一个针对与青少年健康相关的推文进行优化的增强型主题建模框架。该模型包含三项关键创新:一个流行度偏差正则化器(PDR),用于抑制高频通用术语并放大特定领域的词汇;一个动态文档嵌入优化器(DDEO),它根据轮廓分数自适应地选择最佳的UMAP维度;以及一个概率重新分配矩阵(PRM),用于将离群文档重新分配到相关的主题簇中。使用一个包含64441条推文的数据集(成功分类61039条),实验结果表明,BERTopic_Teen在所有关键评估指标上均优于LDA、NMF、Top2Vec和原始的BERTopic。它在主题连贯性方面提高了16.1%(NPMI = 0.2184),具有更高的主题多样性(TD = 0.9935)和更低的困惑度(1.7214),表明其在语义清晰度、主题独特性和建模稳定性方面表现更优。这些发现表明,BERTopic_Teen为从社交媒体数据中提取有意义的主题并推进公共卫生监测提供了一个强大的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/34a97a4b9620/fpubh-13-1608241-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/4b027b8b4d38/fpubh-13-1608241-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/c91839f70018/fpubh-13-1608241-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/4820717eae0d/fpubh-13-1608241-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/aea2c36b5e2b/fpubh-13-1608241-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/2b1e0c83e0b5/fpubh-13-1608241-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/34a97a4b9620/fpubh-13-1608241-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/4b027b8b4d38/fpubh-13-1608241-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/c91839f70018/fpubh-13-1608241-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/4820717eae0d/fpubh-13-1608241-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/aea2c36b5e2b/fpubh-13-1608241-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/2b1e0c83e0b5/fpubh-13-1608241-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41ce/12378273/34a97a4b9620/fpubh-13-1608241-g0006.jpg

相似文献

1
BERTopic_Teen: a multi-module optimization approach for short text topic modeling in adolescent health.BERTopic_Teen:一种用于青少年健康领域短文本主题建模的多模块优化方法。
Front Public Health. 2025 Aug 12;13:1608241. doi: 10.3389/fpubh.2025.1608241. eCollection 2025.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Improving Suicidal Ideation Detection in Social Media Posts: Topic Modeling and Synthetic Data Augmentation Approach.提高社交媒体帖子中自杀意念检测的能力:主题建模与合成数据增强方法
JMIR Form Res. 2025 Jun 11;9:e63272. doi: 10.2196/63272.
4
Artificial Intelligence-Driven Analysis Identifies Anterior Cruciate Ligament Reconstruction, Hip Arthroscopy and Femoroacetabular Impingement Syndrome, and Shoulder Instability as the Most Commonly Published Topics in Arthroscopy.人工智能驱动的分析表明,前交叉韧带重建、髋关节镜检查与股骨髋臼撞击综合征以及肩关节不稳是关节镜领域发表最为频繁的主题。
Arthrosc Sports Med Rehabil. 2025 Feb 21;7(3):101108. doi: 10.1016/j.asmr.2025.101108. eCollection 2025 Jun.
5
A Typology of Social Media Use by Human Service Nonprofits: Mixed Methods Study.社交媒体在人类服务非营利组织中的应用类型学:混合方法研究。
J Med Internet Res. 2024 May 8;26:e51698. doi: 10.2196/51698.
6
Fast2Vec, a modified model of FastText that enhances semantic analysis in topic evolution.Fast2Vec,一种改进的FastText模型,可增强主题演变中的语义分析。
PeerJ Comput Sci. 2025 May 19;11:e2862. doi: 10.7717/peerj-cs.2862. eCollection 2025.
7
Decoding HIV Discourse on Social Media: Large-Scale Analysis of 191,972 Tweets Using Machine Learning, Topic Modeling, and Temporal Analysis.解码社交媒体上关于艾滋病病毒的话语:使用机器学习、主题建模和时间分析对191,972条推文进行大规模分析
J Med Internet Res. 2025 Aug 29;27:e76745. doi: 10.2196/76745.
8
Public Attention to Mpox in China During the Pandemic: Qualitative Analysis of TikTok Data Using Latent Dirichlet Allocation Topic Modeling.疫情期间中国公众对猴痘的关注:基于潜在狄利克雷分配主题模型的TikTok数据定性分析
J Med Internet Res. 2025 Aug 21;27:e77424. doi: 10.2196/77424.
9
Using Natural Language Processing to Explore Social Media Opinions on Food Security: Sentiment Analysis and Topic Modeling Study.使用自然语言处理技术探索社交媒体对食品安全的看法:情感分析和主题建模研究。
J Med Internet Res. 2024 Mar 21;26:e47826. doi: 10.2196/47826.
10
Analyzing Reddit Forums Specific to Abortion That Yield Diverse Dialogues Pertaining to Medical Information Seeking and Personal Worldviews: Data Mining and Natural Language Processing Comparative Study.分析特定于堕胎的 Reddit 论坛,以挖掘和自然语言处理比较研究涉及医学信息寻求和个人世界观的多样化对话。
J Med Internet Res. 2024 Feb 14;26:e47408. doi: 10.2196/47408.

本文引用的文献

1
FET-LM: Flow-Enhanced Variational Autoencoder for Topic-Guided Language Modeling.
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):11180-11193. doi: 10.1109/TNNLS.2023.3249253. Epub 2024 Aug 5.
2
Mental health symptoms and sleep quality of asymptomatic/mild SARS-CoV-2 infected individuals during the Omicron wave of the COVID-19 pandemic in Shanghai China.中国上海 COVID-19 大流行奥密克戎变异株期间无症状/轻症 SARS-CoV-2 感染者的心理健康症状和睡眠质量。
Brain Behav. 2022 Dec;12(12):e2803. doi: 10.1002/brb3.2803. Epub 2022 Nov 3.
3
Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis.大数据背景下的短文本主题建模方法:分类、综述与分析
Artif Intell Rev. 2023;56(6):5133-5260. doi: 10.1007/s10462-022-10254-w. Epub 2022 Oct 26.
4
The Response of Governments and Public Health Agencies to COVID-19 Pandemics on Social Media: A Multi-Country Analysis of Twitter Discourse.社交媒体上各国政府和公共卫生机构对 COVID-19 大流行的反应:对推特讨论内容的多国分析。
Front Public Health. 2021 Sep 28;9:716333. doi: 10.3389/fpubh.2021.716333. eCollection 2021.
5
Analysis of a Vaping-Associated Lung Injury Outbreak through Participatory Surveillance and Archival Internet Data.通过参与式监测和档案互联网数据分析与电子烟相关的肺损伤爆发。
Int J Environ Res Public Health. 2021 Aug 3;18(15):8203. doi: 10.3390/ijerph18158203.
6
Analyzing Twitter Data to Evaluate People's Attitudes towards Public Health Policies and Events in the Era of COVID-19.分析推特数据以评估人们在 COVID-19 时代对公共卫生政策和事件的态度。
Int J Environ Res Public Health. 2021 Jun 10;18(12):6272. doi: 10.3390/ijerph18126272.
7
A critical review of emerging technologies for tackling COVID-19 pandemic.对应对新冠疫情的新兴技术的批判性综述。
Hum Behav Emerg Technol. 2021 Jan;3(1):25-39. doi: 10.1002/hbe2.237. Epub 2020 Dec 1.
8
Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and Topic Modeling Study.公众对 Twitter 上 COVID-19 大流行的看法:情感分析和主题建模研究。
JMIR Public Health Surveill. 2020 Nov 11;6(4):e21978. doi: 10.2196/21978.
9
Comparing two methods for deriving dietary patterns associated with risk of metabolic syndrome among middle-aged and elderly Taiwanese adults with impaired kidney function.比较两种方法来推导与肾功能受损的中老年台湾成年人代谢综合征风险相关的饮食模式。
BMC Med Res Methodol. 2020 Oct 14;20(1):255. doi: 10.1186/s12874-020-01142-4.
10
What demographic attributes do our digital footprints reveal? A systematic review.我们的数字足迹揭示了哪些人口统计属性?系统评价。
PLoS One. 2018 Nov 28;13(11):e0207112. doi: 10.1371/journal.pone.0207112. eCollection 2018.