基于随机森林的单核苷酸多态性相关性学习在 2 型糖尿病风险预测中的应用。

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction.

机构信息

University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.

Biomedical Research Institute of Girona, Avda. de França, s/n, 17007 Girona, Spain; CIBERobn Pathophysiology of Obesity and Nutrition, Instituto de Salud Carlos III, Madrid, Spain.

出版信息

Artif Intell Med. 2018 Apr;85:43-49. doi: 10.1016/j.artmed.2017.09.005. Epub 2017 Sep 22.

DOI:10.1016/j.artmed.2017.09.005

PMID:28943335

Abstract

OBJECTIVE

The use of artificial intelligence techniques to find out which Single Nucleotide Polymorphisms (SNPs) promote the development of a disease is one of the features of medical research, as such techniques may potentially aid early diagnosis and help in the prescription of preventive measures. In particular, the aim is to help physicians to identify the relevant SNPs related to Type 2 diabetes, and to build a decision-support tool for risk prediction.

METHODS

We use the Random Forest (RF) technique in order to search for the most important attributes (SNPs) related to diabetes, giving a weight (degree of importance), ranging between 0 and 1, to each attribute. Support Vector Machines and Logistic Regression have also been used since they are two other machine learning techniques that are well-established in the health community. Their performance has been compared to that achieved by RF. Furthermore, the relevance of the attributes obtained through the use of RF has then been used to perform predictions with k-Nearest Neighbour method weighting attributes in the similarity measure according to the relevance of the attributes with RF.

RESULTS

Testing is performed on a set of 677 subjects. RF is able to handle the complexity of features' interactions, overfitting, and unknown attribute values, providing the SNPs' relevance with an up to 0.89 area under the ROC curve in terms of risk prediction. RF outperforms all the other tested machine learning techniques in terms of prediction accuracy, and in terms of the stability of the estimated relevance of the attributes.

CONCLUSIONS

The Random Forest is a useful method for learning predictive models and the relevance of SNPs without any underlying assumption.

摘要

目的

利用人工智能技术找出哪些单核苷酸多态性（SNP）促进疾病的发展，是医学研究的特点之一，因为这些技术可能有助于早期诊断，并有助于制定预防措施。特别是，目的是帮助医生识别与 2 型糖尿病相关的相关 SNP，并构建一个用于风险预测的决策支持工具。

方法

我们使用随机森林（RF）技术来搜索与糖尿病最相关的最重要属性（SNP），并为每个属性分配一个权重（重要程度），范围为 0 到 1。支持向量机和逻辑回归也已被使用，因为它们是在健康领域中广泛使用的两种其他机器学习技术。比较了它们与 RF 实现的性能。此外，通过使用 RF 获得的属性的相关性用于使用 k-最近邻方法进行预测，根据与 RF 的属性相关性在相似度度量中加权属性。

结果

在一组 677 名受试者上进行了测试。RF 能够处理特征交互、过拟合和未知属性值的复杂性，在风险预测方面，RF 提供的 SNP 相关性的 ROC 曲线下面积高达 0.89。在预测准确性和属性估计相关性的稳定性方面，RF 优于所有其他经过测试的机器学习技术。

结论

随机森林是一种无需任何假设即可学习预测模型和 SNP 相关性的有用方法。

相似文献

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction.基于随机森林的单核苷酸多态性相关性学习在 2 型糖尿病风险预测中的应用。

Artif Intell Med. 2018 Apr;85:43-49. doi: 10.1016/j.artmed.2017.09.005. Epub 2017 Sep 22.

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.

Machine Learning-Based Method for Obesity Risk Evaluation Using Single-Nucleotide Polymorphisms Derived from Next-Generation Sequencing.基于机器学习的肥胖风险评估方法：利用来自下一代测序的单核苷酸多态性

J Comput Biol. 2018 Dec;25(12):1347-1360. doi: 10.1089/cmb.2018.0002. Epub 2018 Sep 8.

Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学，使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应

Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.

Learning ensemble classifiers for diabetic retinopathy assessment.学习用于糖尿病性视网膜病变评估的集成分类器。

Artif Intell Med. 2018 Apr;85:50-63. doi: 10.1016/j.artmed.2017.09.006. Epub 2017 Oct 6.

Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case-control cohort analysis.采用基于机器学习和数学建模的多阶段策略预测糖尿病肾病的基因型-表型风险模式：一项前瞻性病例对照队列分析。

BMC Nephrol. 2013 Jul 23;14:162. doi: 10.1186/1471-2369-14-162.

A decision support system to facilitate management of patients with acute gastrointestinal bleeding.一个有助于急性胃肠道出血患者管理的决策支持系统。

Artif Intell Med. 2008 Mar;42(3):247-59. doi: 10.1016/j.artmed.2007.10.003. Epub 2007 Dec 11.

Machine learning approach to single nucleotide polymorphism-based asthma prediction.基于单核苷酸多态性的哮喘预测的机器学习方法。

PLoS One. 2019 Dec 4;14(12):e0225574. doi: 10.1371/journal.pone.0225574. eCollection 2019.

Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。

Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.

A comparative study on feature selection for a risk prediction model for colorectal cancer.用于结直肠癌风险预测模型的特征选择的比较研究。

Comput Methods Programs Biomed. 2019 Aug;177:219-229. doi: 10.1016/j.cmpb.2019.06.001. Epub 2019 Jun 4.

引用本文的文献

Genetic Artificial Intelligence in Gastrointestinal Disease: A Systematic Review.胃肠道疾病中的遗传人工智能：系统评价

Diagnostics (Basel). 2025 Sep 2;15(17):2227. doi: 10.3390/diagnostics15172227.

DNA sequence classification for diabetes mellitus using NuSVC and XGBoost: A comparative.使用NuSVC和XGBoost对糖尿病进行DNA序列分类：一项比较研究。

PLoS One. 2025 Jul 18;20(7):e0328253. doi: 10.1371/journal.pone.0328253. eCollection 2025.

Ge-SAND: an explainable deep learning-driven framework for disease risk prediction by uncovering complex genetic interactions in parallel.Ge-SAND：一个通过并行揭示复杂基因相互作用来进行疾病风险预测的可解释深度学习驱动框架。

BMC Genomics. 2025 May 1;26(1):432. doi: 10.1186/s12864-025-11588-9.

Diabetes prediction model for unbalanced community follow-up data set based on optimal feature selection and scorecard.基于最优特征选择和计分卡的不平衡社区随访数据集糖尿病预测模型

Digit Health. 2024 Feb 29;10:20552076241236370. doi: 10.1177/20552076241236370. eCollection 2024 Jan-Dec.

Cuproptosis-associated ncRNAs predict breast cancer subtypes.铜死亡相关 ncRNAs 预测乳腺癌亚型。

PLoS One. 2024 Feb 26;19(2):e0299138. doi: 10.1371/journal.pone.0299138. eCollection 2024.

Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES).评估惩罚和机器学习方法在韩国基因组与流行病学研究（KoGES）中对哮喘病的预测作用。

BMC Bioinformatics. 2024 Feb 2;25(1):56. doi: 10.1186/s12859-024-05677-x.

Hybrid feature selection and classification technique for early prediction and severity of diabetes type 2.用于 2 型糖尿病早期预测和严重程度的混合特征选择和分类技术。

PLoS One. 2024 Jan 18;19(1):e0292100. doi: 10.1371/journal.pone.0292100. eCollection 2024.

Artificial intelligence in diabetes management: Advancements, opportunities, and challenges.人工智能在糖尿病管理中的应用：进展、机遇与挑战。

Cell Rep Med. 2023 Oct 17;4(10):101213. doi: 10.1016/j.xcrm.2023.101213. Epub 2023 Oct 2.

Strong Cumulative Evidence of Associations of 6 Single Nucleotide Polymorphisms with Ovarian Cancer Risk: An Umbrella Review.6个单核苷酸多态性与卵巢癌风险关联的有力累积证据：一项伞状综述

J Clin Med. 2023 Mar 3;12(5):2025. doi: 10.3390/jcm12052025.

Identification of the Optimal Model for the Prediction of Diabetic Retinopathy in Chinese Rural Population: Handan Eye Study.中国农村人口糖尿病视网膜病变预测最优模型的确定：邯郸眼病研究。

J Diabetes Res. 2022 Nov 16;2022:4282953. doi: 10.1155/2022/4282953. eCollection 2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于随机森林的单核苷酸多态性相关性学习在 2 型糖尿病风险预测中的应用。

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献