• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PAN-LDA:一种基于潜在狄利克雷分配的新型特征提取模型,用于使用机器学习对 COVID-19 数据进行分析。

PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning.

机构信息

Big Data Analytics and Web Intelligence Laboratory, Department of Computer Science & Engineering, Delhi Technological University, New Delhi, India.

出版信息

Comput Biol Med. 2021 Nov;138:104920. doi: 10.1016/j.compbiomed.2021.104920. Epub 2021 Oct 12.

DOI:10.1016/j.compbiomed.2021.104920
PMID:34655902
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8505021/
Abstract

The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online textual data to predict an event's trends can be troublesome. To our knowledge, no study analyzes the online news articles and the disease data about coronavirus disease. Therefore, we propose an LDA-based topic model, called PAN-LDA (Pandemic-Latent Dirichlet allocation), that incorporates the COVID-19 cases data and news articles into common LDA to obtain a new set of features. The generated features are introduced as additional features to Machine learning(ML) algorithms to improve the forecasting of time series data. Furthermore, we are employing collapsed Gibbs sampling (CGS) as the underlying technique for parameter inference. The results from experiments suggest that the obtained features from PAN-LDA generate more identifiable topics and empirically add value to the outcome.

摘要

新型冠状病毒病(COVID-19)的爆发最近被世界卫生组织(WHO)宣布为大流行。社交媒体平台的普及在提供和获取有关任何正在进行的事件的信息方面发挥了至关重要的作用。然而,要预测事件的趋势,消耗大量的在线文本数据可能会很麻烦。据我们所知,目前还没有研究分析有关冠状病毒病的在线新闻文章和疾病数据。因此,我们提出了一种基于 LDA 的主题模型,称为 PAN-LDA(大流行-潜在狄利克雷分配),它将 COVID-19 病例数据和新闻文章纳入常见的 LDA 中,以获得一组新的特征。生成的特征被引入机器学习(ML)算法作为附加特征,以改进时间序列数据的预测。此外,我们正在使用崩溃吉布斯抽样(CGS)作为参数推断的基础技术。实验结果表明,PAN-LDA 获得的特征生成了更可识别的主题,并在结果中实际增加了价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/a3d735e640b9/gr7_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/fc624f15abae/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/53fef98bd7cc/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/311b7a11f392/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/c784546edd52/fx1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/4494eff883d3/fx2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/cfc1bdcd378f/fx3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/5dafb3d8bb92/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/7f4ebb373e20/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/101d940892e6/gr6_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/a3d735e640b9/gr7_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/fc624f15abae/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/53fef98bd7cc/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/311b7a11f392/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/c784546edd52/fx1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/4494eff883d3/fx2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/cfc1bdcd378f/fx3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/5dafb3d8bb92/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/7f4ebb373e20/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/101d940892e6/gr6_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86c9/8505021/a3d735e640b9/gr7_lrg.jpg

相似文献

1
PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning.PAN-LDA:一种基于潜在狄利克雷分配的新型特征提取模型,用于使用机器学习对 COVID-19 数据进行分析。
Comput Biol Med. 2021 Nov;138:104920. doi: 10.1016/j.compbiomed.2021.104920. Epub 2021 Oct 12.
2
Mining topic and sentiment dynamics in physician rating websites during the early wave of the COVID-19 pandemic: Machine learning approach.在 COVID-19 大流行早期,利用机器学习方法挖掘医生评价网站上的主题和情绪动态。
Int J Med Inform. 2021 May;149:104434. doi: 10.1016/j.ijmedinf.2021.104434. Epub 2021 Feb 26.
3
Topics, Trends, and Sentiments of Tweets About the COVID-19 Pandemic: Temporal Infoveillance Study.关于新冠疫情的推文主题、趋势和情绪:时间信息监测研究
J Med Internet Res. 2020 Oct 23;22(10):e22624. doi: 10.2196/22624.
4
Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data.新冠疫情期间中国社交媒体用户表达的担忧:对新浪微博数据的内容分析
J Med Internet Res. 2020 Nov 26;22(11):e22152. doi: 10.2196/22152.
5
Exploring COVID-19-Related Stressors: Topic Modeling Study.探讨与 COVID-19 相关应激源:主题建模研究。
J Med Internet Res. 2022 Jul 13;24(7):e37142. doi: 10.2196/37142.
6
Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach.关于新冠疫情的推特讨论与情绪:机器学习方法
J Med Internet Res. 2020 Nov 25;22(11):e20550. doi: 10.2196/20550.
7
Social Media Insights Into US Mental Health During the COVID-19 Pandemic: Longitudinal Analysis of Twitter Data.社交媒体洞察美国在 COVID-19 大流行期间的心理健康状况:对 Twitter 数据的纵向分析。
J Med Internet Res. 2020 Dec 14;22(12):e21418. doi: 10.2196/21418.
8
Topic Analysis of Traditional and Social Media News Coverage of the Early COVID-19 Pandemic and Implications for Public Health Communication.传统媒体和社交媒体对 COVID-19 大流行早期报道的主题分析及其对公共卫生传播的启示。
Disaster Med Public Health Prep. 2022 Oct;16(5):1881-1888. doi: 10.1017/dmp.2021.65. Epub 2021 Mar 3.
9
Emergency Physician Twitter Use in the COVID-19 Pandemic as a Potential Predictor of Impending Surge: Retrospective Observational Study.《COVID-19 大流行期间急诊医师在 Twitter 上的使用情况可能预示着即将出现的疫情高峰:回顾性观察研究》
J Med Internet Res. 2021 Jul 14;23(7):e28615. doi: 10.2196/28615.
10
Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study.新冠疫情期间推特用户的主要担忧:信息监测研究
J Med Internet Res. 2020 Apr 21;22(4):e19016. doi: 10.2196/19016.

引用本文的文献

1
Optimizing forensic file classification: enhancing SFCS with β hyperparameter tuning.优化法医文件分类:通过β超参数调整增强SFCS
PeerJ Comput Sci. 2025 Mar 5;11:e2608. doi: 10.7717/peerj-cs.2608. eCollection 2025.
2
Assessment of the Efficiency of a ChatGPT-Based Tool, MyGenAssist, in an Industry Pharmacovigilance Department for Case Documentation: Cross-Over Study.基于ChatGPT的工具MyGenAssist在行业药物警戒部门用于病例记录的效率评估:交叉研究
J Med Internet Res. 2025 Mar 10;27:e65651. doi: 10.2196/65651.
3
Prognostic prediction models for postoperative patients with stage I to III colorectal cancer based on machine learning.

本文引用的文献

1
Gaussian hierarchical latent Dirichlet allocation: Bringing polysemy back.高斯层次潜在狄利克雷分配:使多义性回归。
PLoS One. 2023 Jul 12;18(7):e0288274. doi: 10.1371/journal.pone.0288274. eCollection 2023.
2
Time series analysis of hemorrhagic fever with renal syndrome in mainland China by using an XGBoost forecasting model.中国大陆肾综合征出血热的时间序列分析:基于 XGBoost 预测模型。
BMC Infect Dis. 2021 Aug 19;21(1):839. doi: 10.1186/s12879-021-06503-y.
3
COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization.
基于机器学习的Ⅰ至Ⅲ期结直肠癌术后患者预后预测模型
World J Gastrointest Oncol. 2024 Dec 15;16(12):4597-4613. doi: 10.4251/wjgo.v16.i12.4597.
4
How perceived sustainability influences consumers' clothing preferences.消费者感知的可持续性如何影响其服装偏好。
Sci Rep. 2024 Nov 19;14(1):28672. doi: 10.1038/s41598-024-80279-4.
5
Hot Topic Recognition of Health Rumors Based on Anti-Rumor Articles on the WeChat Official Account Platform: Topic Modeling.基于微信公众号反谣言文章的健康谣言热点话题识别:主题建模。
J Med Internet Res. 2023 Sep 21;25:e45019. doi: 10.2196/45019.
6
Lightweight deep CNN-based models for early detection of COVID-19 patients from chest X-ray images.基于轻量级深度卷积神经网络的模型用于从胸部X光图像中早期检测新冠肺炎患者。
Expert Syst Appl. 2023 Aug 1;223:119900. doi: 10.1016/j.eswa.2023.119900. Epub 2023 Mar 18.
7
A deep feature-level fusion model for masked face identity recommendation system.一种用于蒙面人脸身份推荐系统的深度特征级融合模型。
J Ambient Intell Humaniz Comput. 2022 Sep 19:1-14. doi: 10.1007/s12652-022-04380-0.
基于深度学习的语义搜索、问答和摘要生成技术进行的COVID-19信息检索
NPJ Digit Med. 2021 Apr 12;4(1):68. doi: 10.1038/s41746-021-00437-0.
4
Data analysis of Covid-19 pandemic and short-term cumulative case forecasting using machine learning time series methods.使用机器学习时间序列方法对新冠疫情进行数据分析及短期累计病例预测。
Chaos Solitons Fractals. 2021 Jan;142:110512. doi: 10.1016/j.chaos.2020.110512. Epub 2020 Nov 28.
5
Analysis of spatiotemporal characteristics of big data on social media sentiment with COVID-19 epidemic topics.基于新冠肺炎疫情主题的社交媒体舆情大数据时空特征分析
Chaos Solitons Fractals. 2020 Nov;140:110123. doi: 10.1016/j.chaos.2020.110123. Epub 2020 Jul 17.
6
Machine learning approaches to predict peak demand days of cardiovascular admissions considering environmental exposure.考虑环境暴露因素的机器学习方法预测心血管病入院高峰日
BMC Med Inform Decis Mak. 2020 May 1;20(1):83. doi: 10.1186/s12911-020-1101-8.
7
Prescription Function Prediction Using Topic Model and Multilabel Classifiers.使用主题模型和多标签分类器进行处方功能预测
Evid Based Complement Alternat Med. 2017;2017:8279109. doi: 10.1155/2017/8279109. Epub 2017 Oct 11.
8
Support vector machine with adaptive parameters in financial time series forecasting.金融时间序列预测中具有自适应参数的支持向量机。
IEEE Trans Neural Netw. 2003;14(6):1506-18. doi: 10.1109/TNN.2003.820556.
9
Finding scientific topics.寻找科学主题。
Proc Natl Acad Sci U S A. 2004 Apr 6;101 Suppl 1(Suppl 1):5228-35. doi: 10.1073/pnas.0307752101. Epub 2004 Feb 10.