文献检索，用中文搜 PubMed

OBJECTIVE

Despite growing interest in the application of machine learning (ML) in proteomics, a comprehensive and systematic mapping of this research domain has been lacking. This study addresses this gap by conducting the first large-scale bibliometric analysis focused exclusively on ML-driven proteomics, aiming to elucidate its knowledge structure, development trajectory, and emerging research trends.

METHODS

A total of 5,156 publications from the Web of Science Core Collection (1997-2024) were retrieved and analyzed. Bibliometric tools including CiteSpace 6.4.R1, VOSviewer 1.6.18, Scimago Graphica, and the R package bibliometrix were used to extract and visualize key bibliometric indicators. After data cleaning and de-duplication, analyses were conducted on keyword co-occurrence, citation networks, leading journals, influential authors, and institutional collaboration patterns to construct a comprehensive landscape of ML applications in proteomics.

RESULTS

The number of publications has grown exponentially since 2010, with an average annual growth rate of 12.53% and a notable surge of 65.14% occurring between 2019 and 2020. The United States emerged as the most productive country, while the Chinese Academy of Sciences led among institutions. AlphaFold2-related research received the highest citations, reflecting the transformative role of deep learning in protein structure prediction. Thematic clustering revealed key research foci, including deep learning algorithms, protein-protein interaction prediction, and integrative multi-omics analysis. The field is characterized by strong interdisciplinary convergence, involving computer science, molecular biology, and clinical research. High-impact journals and influential authors were also identified, providing benchmarks for academic influence and collaboration.

CONCLUSION

This study offers the first comprehensive bibliometric analysis of ML in proteomics, revealing key themes such as deep learning, pretrained models, and multi-omics integration. Future efforts should focus on building interpretable models, enhancing cross-disciplinary collaboration, and ensuring secure, standardized data use to advance precision medicine.

SYSTEMATIC REVIEW REGISTRATION

https://doi.org/10.17605/OSF.IO/F4WUG.

OBJECTIVE

METHODS

RESULTS

CONCLUSION

SYSTEMATIC REVIEW REGISTRATION

https://doi.org/10.17605/OSF.IO/F4WUG.

目的

尽管机器学习（ML）在蛋白质组学中的应用越来越受到关注，但该研究领域仍缺乏全面系统的图谱。本研究通过开展首次专门针对ML驱动的蛋白质组学的大规模文献计量分析来填补这一空白，旨在阐明其知识结构、发展轨迹和新兴研究趋势。

方法

从科学引文索引核心合集（1997 - 2024年）中检索并分析了总共5156篇出版物。使用包括CiteSpace 6.4.R1、VOSviewer 1.6.18、Scimago Graphica和R包bibliometrix在内的文献计量工具来提取和可视化关键文献计量指标。在数据清理和去重后，对关键词共现、引文网络、领先期刊、有影响力的作者和机构合作模式进行了分析，以构建ML在蛋白质组学中应用的全面图景。

结果

自2010年以来，出版物数量呈指数增长，年均增长率为12.53%，2019年至2020年间显著增长了65.14%。美国是产出最多的国家，而中国科学院在机构中领先。与AlphaFold2相关的研究获得的引用最多，反映了深度学习在蛋白质结构预测中的变革性作用。主题聚类揭示了关键研究重点，包括深度学习算法、蛋白质 - 蛋白质相互作用预测和综合多组学分析。该领域的特点是具有很强的跨学科融合，涉及计算机科学、分子生物学和临床研究。还确定了高影响力期刊和有影响力的作者，为学术影响力和合作提供了基准。