基于机器学习的2型糖尿病亚型分类研究。

Machine learning based study for the classification of Type 2 diabetes mellitus subtypes.

作者信息

Ordoñez-Guillen Nelson E, Gonzalez-Compean Jose Luis, Lopez-Arevalo Ivan, Contreras-Murillo Miguel, Aldana-Bobadilla Edwin

机构信息

Cinvestav Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, 87130, Tamaulipas, Mexico.

CONAHCYT-Centro de Investigación y de Estudios Avanzados del IPN, Unidad Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, Tamaulipas, 87130, Mexico.

出版信息

BioData Min. 2023 Aug 22;16(1):24. doi: 10.1186/s13040-023-00340-2.

DOI:10.1186/s13040-023-00340-2

PMID:37608329

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10463725/

Abstract

PURPOSE

Data-driven diabetes research has increased its interest in exploring the heterogeneity of the disease, aiming to support in the development of more specific prognoses and treatments within the so-called precision medicine. Recently, one of these studies found five diabetes subgroups with varying risks of complications and treatment responses. Here, we tackle the development and assessment of different models for classifying Type 2 Diabetes (T2DM) subtypes through machine learning approaches, with the aim of providing a performance comparison and new insights on the matter.

METHODS

We developed a three-stage methodology starting with the preprocessing of public databases NHANES (USA) and ENSANUT (Mexico) to construct a dataset with N = 10,077 adult diabetes patient records. We used N = 2,768 records for training/validation of models and left the remaining (N = 7,309) for testing. In the second stage, groups of observations -each one representing a T2DM subtype- were identified. We tested different clustering techniques and strategies and validated them by using internal and external clustering indices; obtaining two annotated datasets Dset A and Dset B. In the third stage, we developed different classification models assaying four algorithms, seven input-data schemes, and two validation settings on each annotated dataset. We also tested the obtained models using a majority-vote approach for classifying unseen patient records in the hold-out dataset.

RESULTS

From the independently obtained bootstrap validation for Dset A and Dset B, mean accuracies across all seven data schemes were [Formula: see text] ([Formula: see text]) and [Formula: see text] ([Formula: see text]), respectively. Best accuracies were [Formula: see text] and [Formula: see text]. Both validation setting results were consistent. For the hold-out dataset, results were consonant with most of those obtained in the literature in terms of class proportions.

CONCLUSION

The development of machine learning systems for the classification of diabetes subtypes constitutes an important task to support physicians for fast and timely decision-making. We expect to deploy this methodology in a data analysis platform to conduct studies for identifying T2DM subtypes in patient records from hospitals.

摘要

目的

数据驱动的糖尿病研究对探索该疾病的异质性兴趣日增，旨在为所谓的精准医学中更具针对性的预后和治疗方法的开发提供支持。最近，其中一项研究发现了五个糖尿病亚组，其并发症风险和治疗反应各不相同。在此，我们通过机器学习方法来处理用于对2型糖尿病（T2DM）亚型进行分类的不同模型的开发和评估，目的是提供性能比较并就此问题给出新的见解。

方法

我们开发了一种三阶段方法，首先对美国国家健康与营养检查调查（NHANES）和墨西哥全国健康与营养状况调查（ENSANUT）等公共数据库进行预处理，以构建一个包含N = 10,077条成年糖尿病患者记录的数据集。我们使用N = 2,768条记录进行模型的训练/验证，其余（N = 7,309条）用于测试。在第二阶段，确定了观察组，每个观察组代表一种T2DM亚型。我们测试了不同的聚类技术和策略，并通过使用内部和外部聚类指标对其进行验证；获得了两个带注释的数据集Dset A和Dset B。在第三阶段，我们开发了不同的分类模型，在每个带注释的数据集上测试四种算法、七种输入数据方案和两种验证设置。我们还使用多数投票方法对保留数据集中未见过的患者记录进行分类，以此测试所获得的模型。

结果

从对Dset A和Dset B独立获得的自助法验证结果来看，所有七种数据方案的平均准确率分别为[公式：见原文]（[公式：见原文]）和[公式：见原文]（[公式：见原文]）。最佳准确率分别为[公式：见原文]和[公式：见原文]。两种验证设置的结果一致。对于保留数据集，就类别比例而言，结果与文献中获得的大多数结果一致。

结论

开发用于糖尿病亚型分类的机器学习系统是一项重要任务，可为医生提供快速及时的决策支持。我们期望在数据分析平台中部署此方法，以开展研究来识别医院患者记录中的T2DM亚型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c977/10463725/b091220041ab/13040_2023_340_Fig1_HTML.jpg

相似文献

Machine learning based study for the classification of Type 2 diabetes mellitus subtypes.

BioData Min. 2023 Aug 22;16(1):24. doi: 10.1186/s13040-023-00340-2.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A convolutional neural network with self-attention for fully automated metabolic tumor volume delineation of head and neck cancer in [Formula: see text]F]FDG PET/CT.

Eur J Nucl Med Mol Imaging. 2023 Jul;50(9):2751-2766. doi: 10.1007/s00259-023-06197-1. Epub 2023 Apr 20.

Identifying prostate cancer and its clinical risk in asymptomatic men using machine learning of high dimensional peripheral blood flow cytometric natural killer cell subset phenotyping data.

Elife. 2020 Jul 28;9:e50936. doi: 10.7554/eLife.50936.

Single-trial extraction of event-related potentials (ERPs) and classification of visual stimuli by ensemble use of discrete wavelet transform with Huffman coding and machine learning techniques.

J Neuroeng Rehabil. 2023 Jun 2;20(1):70. doi: 10.1186/s12984-023-01179-8.

Comparison of different feature extraction methods for applicable automated ICD coding.

BMC Med Inform Decis Mak. 2022 Jan 12;22(1):11. doi: 10.1186/s12911-022-01753-5.

Enhancing classification accuracy of fNIRS-BCI using features acquired from vector-based phase analysis.

J Neural Eng. 2020 Oct 15;17(5):056025. doi: 10.1088/1741-2552/abb417.

A filter approach for feature selection in classification: application to automatic atrial fibrillation detection in electrocardiogram recordings.

BMC Med Inform Decis Mak. 2021 May 4;21(Suppl 4):130. doi: 10.1186/s12911-021-01427-8.

Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.

Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

本文引用的文献

Detecting Sarcopenia Risk by Diabetes Clustering: A Japanese Prospective Cohort Study.

J Clin Endocrinol Metab. 2022 Sep 28;107(10):2729-2736. doi: 10.1210/clinem/dgac430.

Machine learning algorithm to evaluate risk factors of diabetic foot ulcers and its severity.

Med Biol Eng Comput. 2022 Aug;60(8):2349-2357. doi: 10.1007/s11517-022-02617-w. Epub 2022 Jun 25.

Classification of painful or painless diabetic peripheral neuropathy and identification of the most powerful predictors using machine learning models in large cross-sectional cohorts.

BMC Med Inform Decis Mak. 2022 May 29;22(1):144. doi: 10.1186/s12911-022-01890-x.

Heterogeneity in phenotype, disease progression and drug response in type 2 diabetes.

Nat Med. 2022 May;28(5):982-988. doi: 10.1038/s41591-022-01790-7. Epub 2022 May 9.

A Comprehensive Review of Various Diabetic Prediction Models: A Literature Survey.

J Healthc Eng. 2022 Apr 12;2022:8100697. doi: 10.1155/2022/8100697. eCollection 2022.

Differences in the prevalence of erectile dysfunction between novel subgroups of recent-onset diabetes.

Diabetologia. 2022 Mar;65(3):552-562. doi: 10.1007/s00125-021-05607-z. Epub 2021 Nov 20.

Heterogeneity of Diabetes: β-Cells, Phenotypes, and Precision Medicine: Proceedings of an International Symposium of the Canadian Institutes of Health Research's Institute of Nutrition, Metabolism and Diabetes and the U.S. National Institutes of Health's National Institute of Diabetes and Digestive and Kidney Diseases.

Diabetes Care. 2022 Jan 1;45(1):3-22. doi: 10.2337/dci21-0051.

Prevalence Trends of Diabetes Subgroups in the United States: A Data-driven Analysis Spanning Three Decades From NHANES (1988-2018).

J Clin Endocrinol Metab. 2022 Feb 17;107(3):735-742. doi: 10.1210/clinem/dgab762.

Validation of the classification for type 2 diabetes into five subgroups: a report from the ORIGIN trial.

Diabetologia. 2022 Jan;65(1):206-215. doi: 10.1007/s00125-021-05567-4. Epub 2021 Oct 21.

Artificial intelligence and diabetes technology: A review.

Metabolism. 2021 Nov;124:154872. doi: 10.1016/j.metabol.2021.154872. Epub 2021 Sep 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于机器学习的2型糖尿病亚型分类研究。

Machine learning based study for the classification of Type 2 diabetes mellitus subtypes.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献