Suppr超能文献

基于机器学习的2型糖尿病亚型分类研究。

Machine learning based study for the classification of Type 2 diabetes mellitus subtypes.

作者信息

Ordoñez-Guillen Nelson E, Gonzalez-Compean Jose Luis, Lopez-Arevalo Ivan, Contreras-Murillo Miguel, Aldana-Bobadilla Edwin

机构信息

Cinvestav Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, 87130, Tamaulipas, Mexico.

CONAHCYT-Centro de Investigación y de Estudios Avanzados del IPN, Unidad Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, Tamaulipas, 87130, Mexico.

出版信息

BioData Min. 2023 Aug 22;16(1):24. doi: 10.1186/s13040-023-00340-2.

Abstract

PURPOSE

Data-driven diabetes research has increased its interest in exploring the heterogeneity of the disease, aiming to support in the development of more specific prognoses and treatments within the so-called precision medicine. Recently, one of these studies found five diabetes subgroups with varying risks of complications and treatment responses. Here, we tackle the development and assessment of different models for classifying Type 2 Diabetes (T2DM) subtypes through machine learning approaches, with the aim of providing a performance comparison and new insights on the matter.

METHODS

We developed a three-stage methodology starting with the preprocessing of public databases NHANES (USA) and ENSANUT (Mexico) to construct a dataset with N = 10,077 adult diabetes patient records. We used N = 2,768 records for training/validation of models and left the remaining (N = 7,309) for testing. In the second stage, groups of observations -each one representing a T2DM subtype- were identified. We tested different clustering techniques and strategies and validated them by using internal and external clustering indices; obtaining two annotated datasets Dset A and Dset B. In the third stage, we developed different classification models assaying four algorithms, seven input-data schemes, and two validation settings on each annotated dataset. We also tested the obtained models using a majority-vote approach for classifying unseen patient records in the hold-out dataset.

RESULTS

From the independently obtained bootstrap validation for Dset A and Dset B, mean accuracies across all seven data schemes were [Formula: see text] ([Formula: see text]) and [Formula: see text] ([Formula: see text]), respectively. Best accuracies were [Formula: see text] and [Formula: see text]. Both validation setting results were consistent. For the hold-out dataset, results were consonant with most of those obtained in the literature in terms of class proportions.

CONCLUSION

The development of machine learning systems for the classification of diabetes subtypes constitutes an important task to support physicians for fast and timely decision-making. We expect to deploy this methodology in a data analysis platform to conduct studies for identifying T2DM subtypes in patient records from hospitals.

摘要

目的

数据驱动的糖尿病研究对探索该疾病的异质性兴趣日增,旨在为所谓的精准医学中更具针对性的预后和治疗方法的开发提供支持。最近,其中一项研究发现了五个糖尿病亚组,其并发症风险和治疗反应各不相同。在此,我们通过机器学习方法来处理用于对2型糖尿病(T2DM)亚型进行分类的不同模型的开发和评估,目的是提供性能比较并就此问题给出新的见解。

方法

我们开发了一种三阶段方法,首先对美国国家健康与营养检查调查(NHANES)和墨西哥全国健康与营养状况调查(ENSANUT)等公共数据库进行预处理,以构建一个包含N = 10,077条成年糖尿病患者记录的数据集。我们使用N = 2,768条记录进行模型的训练/验证,其余(N = 7,309条)用于测试。在第二阶段,确定了观察组,每个观察组代表一种T2DM亚型。我们测试了不同的聚类技术和策略,并通过使用内部和外部聚类指标对其进行验证;获得了两个带注释的数据集Dset A和Dset B。在第三阶段,我们开发了不同的分类模型,在每个带注释的数据集上测试四种算法、七种输入数据方案和两种验证设置。我们还使用多数投票方法对保留数据集中未见过的患者记录进行分类,以此测试所获得的模型。

结果

从对Dset A和Dset B独立获得的自助法验证结果来看,所有七种数据方案的平均准确率分别为[公式:见原文]([公式:见原文])和[公式:见原文]([公式:见原文])。最佳准确率分别为[公式:见原文]和[公式:见原文]。两种验证设置的结果一致。对于保留数据集,就类别比例而言,结果与文献中获得的大多数结果一致。

结论

开发用于糖尿病亚型分类的机器学习系统是一项重要任务,可为医生提供快速及时的决策支持。我们期望在数据分析平台中部署此方法,以开展研究来识别医院患者记录中的T2DM亚型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c977/10463725/b091220041ab/13040_2023_340_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验