Suppr超能文献

基于主题建模的波斯文生物信息学研究分析。

Analysis of Persian Bioinformatics Research with Topic Modeling.

机构信息

Department of Scientometrics, Faculty of Social Sciences, Yazd University, Yazd, Iran.

School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran.

出版信息

Biomed Res Int. 2023 Apr 17;2023:3728131. doi: 10.1155/2023/3728131. eCollection 2023.

Abstract

PURPOSE

As a scientific field, bioinformatics has drawn remarkable attention from various fields, such as information technology, mathematics, and modern biological sciences, in recent years. The topic models originating from the field of natural language processing have become the focus of attention with the rapid accumulation of biological datasets. Thus, this research is aimed at modeling the topic content of the bioinformatics literature presented by Iranian researchers in the Scopus Citation Database. . This research was a descriptive-exploratory study, and the studied population included 3899 papers indexed in the Scopus database, which had been indexed in this database until March 9, 2022. The topic modeling was then performed on the abstracts and titles of the papers. A combination of LDA and TF-IDF was utilized for topic modeling. . The data analysis with topic modeling resulted in identifying seven main topics "Molecular Modeling," "Gene Expression," "Biomarker," "Coronavirus," "Immunoinformatics," "Cancer Bioinformatics," and "Systems Biology." Moreover, "Systems Biology" and "Coronavirus" had the largest and smallest clusters, respectively.

CONCLUSION

The present investigation demonstrated an acceptable performance for the LDA algorithm in classifying the topics included in this field. The extracted topic clusters indicated excellent consistency and topic connection with each other.

摘要

目的

近年来,生物信息学作为一个科学领域,引起了信息技术、数学和现代生物科学等各个领域的极大关注。源自自然语言处理领域的主题模型随着生物数据集的快速积累而成为关注焦点。因此,本研究旨在对 Scopus 引文数据库中伊朗研究人员发表的生物信息学文献的主题内容进行建模。。这项研究是一项描述性探索性研究,研究对象包括截至 2022 年 3 月 9 日在 Scopus 数据库中索引的 3899 篇论文。然后对论文的摘要和标题进行主题建模。主题建模采用了 LDA 和 TF-IDF 的组合。通过主题建模进行数据分析,确定了七个主要主题:“分子建模”、“基因表达”、“生物标志物”、“冠状病毒”、“免疫信息学”、“癌症生物信息学”和“系统生物学”。此外,“系统生物学”和“冠状病毒”分别具有最大和最小的聚类。

结论

本研究表明 LDA 算法在对该领域包含的主题进行分类方面表现良好。提取的主题聚类彼此之间具有极好的一致性和主题连接。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56b3/10125747/fe7755962f33/BMRI2023-3728131.002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验