Suppr超能文献

核机器学习方法处理具有复杂预测因子的缺失响应。在使用分布表示法建模五年葡萄糖变化方面的应用。

Kernel machine learning methods to handle missing responses with complex predictors. Application in modelling five-year glucose changes using distributional representations.

机构信息

CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago de Compostela, Santiago de Compostela 15782, Spain.

CiTIUS (Centro Singular de Investigación en Tecnoloxías Intelixentes), Universidade de Santiago de Compostela, Santiago de Compostela 15782, Spain.

出版信息

Comput Methods Programs Biomed. 2022 Jun;221:106905. doi: 10.1016/j.cmpb.2022.106905. Epub 2022 May 25.

Abstract

BACKGROUND AND OBJECTIVES

Missing data is a ubiquitous problem in longitudinal studies due to the number of patients lost to follow-up. Kernel methods have enriched the machine learning field by successfully managing non-vectorial predictors, such as graphs, strings, and probability distributions, and have emerged as a promising tool for the analysis of complex data stemming from modern healthcare. This paper proposes a new set of kernel methods to handle missing data in the response variables. These methods will be applied to predict long-term changes in glycated haemoglobin (A1c), the primary biomarker used to diagnose and monitor the progression of diabetes mellitus, making emphasis on exploring the predictive potential of continuous glucose monitoring (CGM).

METHODS

We propose a new framework of non-linear kernel methods for testing statistical independence, selecting relevant predictors, and quantifying the uncertainty of the resultant predictive models. As a novelty in the clinical analysis, we used a distributional representation of CGM as a predictor and compared its performance with that of traditional diabetes biomarkers.

RESULTS

The results show that, after the incorporation of CGM information, predictive ability increases from R=0.61 to R=0.71. In addition, uncertainty analysis is useful for characterising some subpopulations where predictivity is worsened, and a more personalised clinical follow-up is advisable according to expected patient uncertainty in glucose values.

CONCLUSIONS

The proposed methods have proven to deal effectively with missing data. They also have the potential to improve the results of predictive tasks by including new complex objects as explanatory variables and modelling arbitrary dependence relations. The application of these methods to a longitudinal study of diabetes showed that the inclusion of a distributional representation of CGM data provides greater sensitivity in predicting five-year A1c changes than classical diabetes biomarkers and traditional CGM metrics.

摘要

背景与目的

由于失访患者数量众多,缺失数据是纵向研究中普遍存在的问题。核方法通过成功管理非向量预测因子(如图形、字符串和概率分布)丰富了机器学习领域,并已成为分析源自现代医疗保健的复杂数据的有前途的工具。本文提出了一组新的核方法来处理响应变量中的缺失数据。这些方法将应用于预测糖化血红蛋白(A1c)的长期变化,A1c 是用于诊断和监测糖尿病进展的主要生物标志物,并重点探索连续血糖监测(CGM)的预测潜力。

方法

我们提出了一种新的非线性核方法框架,用于测试统计独立性、选择相关预测因子,并量化所得预测模型的不确定性。作为临床分析中的新颖性,我们使用 CGM 的分布表示作为预测因子,并将其性能与传统的糖尿病生物标志物进行了比较。

结果

结果表明,在纳入 CGM 信息后,预测能力从 R=0.61 提高到 R=0.71。此外,不确定性分析有助于描述某些预测能力较差的亚组,并且根据患者对血糖值的预期不确定性,建议根据患者的不确定性进行更个性化的临床随访。

结论

所提出的方法已被证明能够有效地处理缺失数据。它们还通过将新的复杂对象作为解释变量包含在内,并对任意依赖关系建模,从而有可能通过包括新的复杂对象作为解释变量并对任意依赖关系建模来提高预测任务的结果。这些方法在糖尿病的纵向研究中的应用表明,与传统的糖尿病生物标志物和传统的 CGM 指标相比,包含 CGM 数据的分布表示形式在预测五年 A1c 变化方面具有更高的灵敏度。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验