Kiran Mahreen, Xie Ying, Anjum Nasreen, Ball Graham, Pierscionek Barbara, Russell Duncan
Faculty of Health, Medicine and Social Care, Anglia Ruskin University, Chelmsford, United Kingdom.
Faculty of Business and Management, Cranfield University School of Management, Cranfield, United Kingdom.
Front Digit Health. 2025 Mar 27;7:1557467. doi: 10.3389/fdgth.2025.1557467. eCollection 2025.
Type 2 Diabetes Mellitus (T2DM) remains a critical global health challenge, necessitating robust predictive models to enable early detection and personalized interventions. This study presents a comprehensive bibliometric and systematic review of 33 years (1991-2024) of research on machine learning (ML) and artificial intelligence (AI) applications in T2DM prediction. It highlights the growing complexity of the field and identifies key trends, methodologies, and research gaps.
A systematic methodology guided the literature selection process, starting with keyword identification using Term Frequency-Inverse Document Frequency (TF-IDF) and expert input. Based on these refined keywords, literature was systematically selected using PRISMA guidelines, resulting in a dataset of 2,351 articles from Web of Science and Scopus databases. Bibliometric analysis was performed on the entire selected dataset using tools such as VOSviewer and Bibliometrix, enabling thematic clustering, co-citation analysis, and network visualization. To assess the most impactful literature, a dual-criteria methodology combining relevance and impact scores was applied. Articles were qualitatively assessed on their alignment with T2DM prediction using a four-point relevance scale and quantitatively evaluated based on citation metrics normalized within subject, journal, and publication year. Articles scoring above a predefined threshold were selected for detailed review. The selected literature spans four time periods: 1991-2000, 2001-2010, 2011-2020, and 2021-2024.
The bibliometric findings reveal exponential growth in publications since 2010, with the USA and UK leading contributions, followed by emerging players like Singapore and India. Key thematic clusters include foundational ML techniques, epidemiological forecasting, predictive modelling, and clinical applications. Ensemble methods (e.g., Random Forest, Gradient Boosting) and deep learning models (e.g., Convolutional Neural Networks) dominate recent advancements. Literature analysis reveals that, early studies primarily used demographic and clinical variables, while recent efforts integrate genetic, lifestyle, and environmental predictors. Additionally, literature analysis highlights advances in integrating real-world datasets, emerging trends like federated learning, and explainability tools such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations).
Future work should address gaps in generalizability, interdisciplinary T2DM prediction research, and psychosocial integration, while also focusing on clinically actionable solutions and real-world applicability to combat the growing diabetes epidemic effectively.
2型糖尿病(T2DM)仍然是一项严峻的全球健康挑战,需要强大的预测模型来实现早期检测和个性化干预。本研究对33年(1991 - 2024年)来机器学习(ML)和人工智能(AI)在T2DM预测中的应用研究进行了全面的文献计量和系统综述。它突出了该领域日益增长的复杂性,并确定了关键趋势、方法和研究差距。
一种系统的方法指导了文献筛选过程,首先使用词频 - 逆文档频率(TF - IDF)和专家意见来确定关键词。基于这些细化的关键词,使用PRISMA指南系统地筛选文献,从而从Web of Science和Scopus数据库中获得了一个包含2351篇文章的数据集。使用VOSviewer和Bibliometrix等工具对整个选定数据集进行文献计量分析,实现主题聚类、共被引分析和网络可视化。为了评估最具影响力的文献,应用了一种结合相关性和影响力得分的双标准方法。使用四分相关性量表对文章与T2DM预测的一致性进行定性评估,并基于在学科、期刊和出版年份内标准化的引用指标进行定量评估。得分高于预定义阈值的文章被选中进行详细审查。选定的文献涵盖四个时间段:1991 - 2000年、2001 - 2010年、2011 - 2020年和2021 - 2024年。
文献计量研究结果显示,自2010年以来出版物呈指数增长,美国和英国的贡献最大,其次是新加坡和印度等新兴国家。关键主题聚类包括基础ML技术、流行病学预测、预测建模和临床应用。集成方法(如随机森林、梯度提升)和深度学习模型(如卷积神经网络)主导了近期的进展。文献分析表明,早期研究主要使用人口统计学和临床变量,而近期的研究则整合了遗传、生活方式和环境预测因素。此外,文献分析突出了在整合真实世界数据集方面的进展、联合学习等新兴趋势以及SHAP(夏普利加性解释)和LIME(局部可解释模型无关解释)等可解释性工具。
未来的工作应解决泛化性、跨学科T2DM预测研究和心理社会整合方面的差距,同时还应关注临床可行的解决方案和实际适用性,以有效应对日益增长的糖尿病流行。