Abba Mustapha, Nduka Chidozie, Anjorin Seun, Mohamed Shukri, Agogo Emmanuel, Uthman Olalekan
Warwick Centre for Global Health, Division of Health Sciences, University of Warwick Medical School, University of Warwick, Coventry, United Kingdom.
Country Office Nigeria, Resolve to Save Lives, Abuja, Nigeria.
JMIR Form Res. 2022 May 18;6(5):e31292. doi: 10.2196/31292.
Due to scientific and technical advancements in the field, published hypertension research has developed substantially during the last decade. Given the amount of scientific material published in this field, identifying the relevant information is difficult. We used topic modeling, which is a strong approach for extracting useful information from enormous amounts of unstructured text.
This study aims to use a machine learning algorithm to uncover hidden topics and subtopics from 100 years of peer-reviewed hypertension publications and identify temporal trends.
The titles and abstracts of hypertension papers indexed in PubMed were examined. We used the latent Dirichlet allocation model to select 20 primary subjects and then ran a trend analysis to see how popular they were over time.
We gathered 581,750 hypertension-related research articles from 1900 to 2018 and divided them into 20 topics. These topics were broadly categorized as preclinical, epidemiology, complications, and therapy studies. Topic 2 (evidence review) and topic 19 (major cardiovascular events) are the key (hot topics). Most of the cardiopulmonary disease subtopics show little variation over time, and only make a small contribution in terms of proportions. The majority of the articles (414,206/581,750; 71.2%) had a negative valency, followed by positive (119, 841/581,750; 20.6%) and neutral valency (47,704/581,750; 8.2%). Between 1980 and 2000, negative sentiment articles fell somewhat, while positive and neutral sentiment articles climbed substantially.
The number of publications has been increasing exponentially over the period. Most of the uncovered topics can be grouped into four categories (ie, preclinical, epidemiology, complications, and treatment-related studies).
由于该领域的科技进步,过去十年中已发表的高血压研究有了显著发展。鉴于该领域发表的科学资料数量众多,识别相关信息很困难。我们使用了主题建模,这是一种从大量非结构化文本中提取有用信息的有效方法。
本研究旨在使用机器学习算法从100年的同行评审高血压出版物中发现隐藏的主题和子主题,并确定时间趋势。
检查了PubMed中索引的高血压论文的标题和摘要。我们使用潜在狄利克雷分配模型选择了20个主要主题,然后进行趋势分析,以了解它们随时间的受欢迎程度。
我们收集了1900年至2018年期间581,750篇与高血压相关的研究文章,并将它们分为20个主题。这些主题大致分为临床前、流行病学、并发症和治疗研究。主题2(证据综述)和主题19(主要心血管事件)是关键(热门话题)。大多数心肺疾病子主题随时间变化不大,在比例方面贡献较小。大多数文章(414,206/581,750;71.2%)具有负向性,其次是正向(119,841/581,750;20.6%)和中性向性(47,704/581,750;8.2%)。在1980年至2000年期间,负面情绪的文章有所下降,而正面和中性情绪的文章大幅上升。
在此期间,出版物数量呈指数级增长。大多数发现的主题可分为四类(即临床前、流行病学、并发症和治疗相关研究)。