Bifarin Olatomiwa O, Yelluru Varun S, Simhadri Aditya, Fernández Facundo M
School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
School of Computer Science, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
Anal Chem. 2025 Jul 15;97(27):14088-14096. doi: 10.1021/acs.analchem.5c01672. Epub 2025 Jul 3.
We present a comprehensive map of the metabolomics research landscape, synthesizing insights from over 80,000 publications. Using PubMedBERT, we transformed abstracts into 768-dimensional embeddings that capture the nuanced thematic structure of the field. Dimensionality reduction with t-SNE revealed distinct clusters corresponding to key domains, such as analytical chemistry, plant biology, pharmacology, and clinical diagnostics. In addition, a neural topic modeling pipeline refined with GPT-4o mini reclassified the corpus into 20 distinct topics─ranging from "Plant Stress Response Mechanisms" and "NMR Spectroscopy Innovations" to "COVID-19 Metabolomic and Immune Responses." Temporal analyses further highlight trends including the rise of deep learning methods post-2015 and a continued focus on biomarker discovery. Integration of metadata such as publication statistics and sample sizes provides additional context to these evolving research dynamics. An interactive web application (https://metascape.streamlit.app/) enables the dynamic exploration of these insights. Overall, this study offers a robust framework for literature synthesis that empowers researchers, clinicians, and policymakers to identify emerging research trajectories and address critical challenges in metabolomics while also sharing our perspectives on key trends shaping the field.
我们展示了代谢组学研究领域的全面图谱,综合了来自80000多篇出版物的见解。利用PubMedBERT,我们将摘要转化为768维嵌入向量,以捕捉该领域细微的主题结构。使用t-SNE进行降维揭示了与关键领域相对应的不同聚类,如分析化学、植物生物学、药理学和临床诊断。此外,用GPT-4o mini优化的神经主题建模管道将语料库重新分类为20个不同的主题,从“植物应激反应机制”和“核磁共振光谱创新”到“COVID-19代谢组学和免疫反应”。时间分析进一步突出了一些趋势,包括2015年后深度学习方法的兴起以及对生物标志物发现的持续关注。整合诸如发表统计和样本量等元数据为这些不断发展的研究动态提供了更多背景信息。一个交互式网络应用程序(https://metascape.streamlit.app/)使人们能够动态探索这些见解。总体而言,本研究提供了一个强大的文献综合框架,使研究人员、临床医生和政策制定者能够识别代谢组学中新兴的研究轨迹并应对关键挑战,同时也分享我们对塑造该领域的关键趋势的看法。