Chen Qian, Ai Ni, Liao Jie, Shao Xin, Liu Yufeng, Fan Xiaohui
School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, 325035 China.
Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058 China.
Chin Med. 2017 Sep 12;12:27. doi: 10.1186/s13020-017-0148-7. eCollection 2017.
Valuable scientific results on biomedicine are very rich, but they are widely scattered in the literature. Topic modeling enables researchers to discover themes from an unstructured collection of documents without any prior annotations or labels. In this paper, taking ginseng as an example, biological dynamic topic model (Bio-DTM) was proposed to conduct a retrospective study and interpret the temporal evolution of the research of ginseng.
The system of Bio-DTM mainly includes four components, documents pre-processing, bio-dictionary construction, dynamic topic models, topics analysis and visualization. Scientific articles pertaining to ginseng were retrieved through text mining from PubMed. The bio-dictionary integrates MedTerms medical dictionary, the second edition of side effect resource, a dictionary of biology and HGNC database of human gene names (HGNC). A dynamic topic model, a text mining technique, was used to emphasize on capturing the development trends of topics in a sequentially collected documents. Besides the contents of topics taken on, the evolution of topics was visualized over time using ThemeRiver.
From the topic 9, ginseng was used in dietary supplements and complementary and integrative health practices, and became very popular since the early twentieth century. Topic 6 reminded that the planting of ginseng is a major area of research and symbiosis and allelopathy of ginseng became a research hotspot in 2007. In addition, the Bio-DTM model gave an insight into the main pharmacologic effects of ginseng, such as anti-metabolic disorder effect, cardioprotective effect, anti-cancer effect, hepatoprotective effect, anti-thrombotic effect and neuroprotective effect.
The Bio-DTM model not only discovers what ginseng's research involving in but also displays how these topics evolving over time. This approach can be applied to the biomedical field to conduct a retrospective study and guide future studies.
生物医学领域有价值的科学成果非常丰富,但它们广泛分散在文献中。主题建模使研究人员能够从非结构化的文档集合中发现主题,而无需任何预先注释或标签。本文以人参为例,提出生物动态主题模型(Bio-DTM)进行回顾性研究,并解读人参研究的时间演变。
Bio-DTM系统主要包括四个组件,即文档预处理、生物词典构建、动态主题模型、主题分析与可视化。通过文本挖掘从PubMed检索与人参相关的科学文章。生物词典整合了医学术语词典、副作用资源第二版、生物学词典和人类基因名称HGNC数据库(HGNC)。动态主题模型是一种文本挖掘技术,用于强调捕捉顺序收集文档中主题的发展趋势。除了主题所涵盖的内容外,还使用主题河流(ThemeRiver)随时间可视化主题的演变。
从主题9来看,人参被用于膳食补充剂以及补充和综合健康实践,自20世纪初以来变得非常流行。主题6提醒人们,人参种植是一个主要研究领域,人参的共生和化感作用在2007年成为研究热点。此外,Bio-DTM模型深入了解了人参的主要药理作用,如抗代谢紊乱作用、心脏保护作用、抗癌作用、肝脏保护作用、抗血栓作用和神经保护作用。
Bio-DTM模型不仅发现了人参研究涉及的内容,还展示了这些主题如何随时间演变。这种方法可应用于生物医学领域进行回顾性研究并指导未来研究。