Center for Primary Care, Harvard Medical School, Boston, MA, USA.
Research and Population Health, Collective Health, San Francisco, CA, USA.
Curr Diab Rep. 2020 Dec 3;20(12):80. doi: 10.1007/s11892-020-01353-5.
Machine learning approaches-which seek to predict outcomes or classify patient features by recognizing patterns in large datasets-are increasingly applied to clinical epidemiology research on diabetes. Given its novelty and emergence in fields outside of biomedical research, machine learning terminology, techniques, and research findings may be unfamiliar to diabetes researchers. Our aim was to present the use of machine learning approaches in an approachable way, drawing from clinical epidemiological research in diabetes published from 1 Jan 2017 to 1 June 2020.
Machine learning approaches using tree-based learners-which produce decision trees to help guide clinical interventions-frequently have higher sensitivity and specificity than traditional regression models for risk prediction. Machine learning approaches using neural networking and "deep learning" can be applied to medical image data, particularly for the identification and staging of diabetic retinopathy and skin ulcers. Among the machine learning approaches reviewed, researchers identified new strategies to develop standard datasets for rigorous comparisons across older and newer approaches, methods to illustrate how a machine learner was treating underlying data, and approaches to improve the transparency of the machine learning process. Machine learning approaches have the potential to improve risk stratification and outcome prediction for clinical epidemiology applications. Achieving this potential would be facilitated by use of universal open-source datasets for fair comparisons. More work remains in the application of strategies to communicate how the machine learners are generating their predictions.
机器学习方法——通过在大型数据集识别模式来预测结果或对患者特征进行分类——越来越多地应用于糖尿病临床流行病学研究。由于其新颖性和在生物医学研究之外的领域的出现,糖尿病研究人员可能不熟悉机器学习的术语、技术和研究结果。我们的目的是以一种易于理解的方式展示机器学习方法的应用,所引用的研究均来自 2017 年 1 月 1 日至 2020 年 6 月 1 日发表的糖尿病临床流行病学研究。
基于树的机器学习方法——生成决策树以帮助指导临床干预——在风险预测方面比传统回归模型具有更高的敏感性和特异性。使用神经网络和“深度学习”的机器学习方法可应用于医学图像数据,特别是用于识别和分期糖尿病性视网膜病变和皮肤溃疡。在综述的机器学习方法中,研究人员确定了新策略,用于开发严格比较新旧方法的标准数据集,说明机器学习如何处理基础数据的方法,以及提高机器学习过程透明度的方法。机器学习方法有可能改善临床流行病学应用的风险分层和结果预测。通过使用通用的开源数据集进行公平比较,将有助于实现这一潜力。在如何沟通机器学习生成预测的策略方面还有更多工作要做。