Suppr超能文献

运用多方面主题建模技术分析生物信息学领域。

Analyzing the field of bioinformatics with the multi-faceted topic modeling technique.

作者信息

Heo Go Eun, Kang Keun Young, Song Min, Lee Jeong-Hoon

机构信息

Department of Library and Information Science, Yonsei University, 50 Yonsei-ro Seodaemun-gu, Seoul, 03722, Republic of Korea.

Department of Creative IT Engineering, POSTECH, 77 Cheongam-ro Nam-gu, Pohang, Gyeongbuk, 37673, Republic of Korea.

出版信息

BMC Bioinformatics. 2017 May 31;18(Suppl 7):251. doi: 10.1186/s12859-017-1640-x.

Abstract

BACKGROUND

Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure.

METHODS

In this paper, we adopt the Tang et al.'s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics.

RESULTS

We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented.

CONCLUSION

The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.

摘要

背景

生物信息学是分子生物学与计算技术交叉的跨学科领域。为了将该领域描述为一个融合领域,研究人员使用了文献计量学,并辅以文本挖掘技术进行内容分析。在先前的研究中,潜在狄利克雷分配(LDA)是识别学科领域主题结构最具代表性的主题建模技术。然而,与揭示与作者、出版日期和期刊等元数据相关的主题结构不同,LDA仅显示简单的主题结构。

方法

在本文中,我们采用唐等人的作者 - 会议 - 主题(ACT)模型,从关键词、作者和期刊的角度研究生物信息学领域。ACT模型能够将论文、作者和会议同时纳入主题分布。为了获得更有意义的结果,我们使用期刊和关键词代替会议和词袋。为了进行分析,我们使用PubMed从MEDLINE数据库中收集了46种生物信息学期刊。我们对1996年至2015年的四个时期进行了时间序列主题分析,以进一步研究生物信息学的跨学科性质。

结果

我们分析了每个时期的ACT模型结果。此外,为了进行进一步的综合分析,我们根据频率对排名靠前的关键词、期刊和作者进行了时间序列分析。我们还通过同时识别每个时期的主题概率以及顶级作者和关键词,研究了顶级期刊中的模式。结果表明,近年来多样化的主题变得更加普遍,而趋同的主题也得到了更清晰的体现。

结论

我们的分析结果表明,随着时间的推移,生物信息学领域变得更加跨学科,概念生物学、数学和系统生物学等外围领域稳步增加。这些结果通过主题分布以及排名靠前的关键词、作者和期刊的综合分析得到了证实。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8ed/5471940/446ccf8f42bc/12859_2017_1640_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验