Department of Neuroradiology, MD Anderson Cancer Center, 1400 Pressler Street, Houston, bX, 77030, USA.
Department of Neurosurgery, Mount Sinai Health System, 1468 Madison Avenue, New York, NY, 10029, USA.
J Stroke Cerebrovasc Dis. 2024 Jun;33(6):107665. doi: 10.1016/j.jstrokecerebrovasdis.2024.107665. Epub 2024 Feb 25.
This study aims to demonstrate the capacity of natural language processing and topic modeling to manage and interpret the vast quantities of scholarly publications in the landscape of stroke research. These tools can expedite the literature review process, reveal hidden themes, and track rising research areas.
Our study involved reviewing and analyzing articles published in five prestigious stroke journals, namely Stroke, International Journal of Stroke, European Stroke Journal, Translational Stroke Research, and Journal of Stroke and Cerebrovascular Diseases. The team extracted document titles, abstracts, publication years, and citation counts from the Scopus database. BERTopic was chosen as the topic modeling technique. Using linear regression models, current stroke research trends were identified. Python 3.1 was used to analyze and visualize data.
Out of the 35,779 documents collected, 26,732 were classified into 30 categories and used for analysis. "Animal Models," "Rehabilitation," and "Reperfusion Therapy" were identified as the three most prevalent topics. Linear regression models identified "Emboli," "Medullary and Cerebellar Infarcts," and "Glucose Metabolism" as trending topics, whereas "Cerebral Venous Thrombosis," "Statins," and "Intracerebral Hemorrhage" demonstrated a weaker trend.
The methodology can assist researchers, funders, and publishers by documenting the evolution and specialization of topics. The findings illustrate the significance of animal models, the expansion of rehabilitation research, and the centrality of reperfusion therapy. Limitations include a five-journal cap and a reliance on high-quality metadata.
本研究旨在展示自然语言处理和主题建模技术在管理和解释中风研究领域大量学术文献方面的能力。这些工具可以加速文献综述过程,揭示隐藏主题,并跟踪研究热点领域的发展。
我们的研究涉及对《中风》、《国际中风杂志》、《欧洲中风杂志》、《转化中风研究》和《中风与脑血管病杂志》等五本著名中风期刊的文章进行回顾和分析。团队从 Scopus 数据库中提取文献标题、摘要、出版年份和引用次数。选择 BERTopic 作为主题建模技术。使用线性回归模型确定当前中风研究趋势。使用 Python 3.1 进行数据分析和可视化。
在所收集的 35779 篇文献中,有 26732 篇被分类为 30 个类别进行分析。“动物模型”、“康复”和“再灌注治疗”被确定为三个最常见的主题。线性回归模型确定了“栓子”、“延髓和小脑梗死”和“葡萄糖代谢”是趋势主题,而“脑静脉血栓形成”、“他汀类药物”和“脑出血”则表现出较弱的趋势。
该方法可以通过记录主题的演变和专业化来帮助研究人员、资助者和出版商。研究结果表明了动物模型的重要性、康复研究的扩展以及再灌注治疗的核心地位。研究的局限性包括仅涵盖五本期刊和对高质量元数据的依赖。