Suppr超能文献

基于随机森林的单核苷酸多态性相关性学习在 2 型糖尿病风险预测中的应用。

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction.

机构信息

University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.

Biomedical Research Institute of Girona, Avda. de França, s/n, 17007 Girona, Spain; CIBERobn Pathophysiology of Obesity and Nutrition, Instituto de Salud Carlos III, Madrid, Spain.

出版信息

Artif Intell Med. 2018 Apr;85:43-49. doi: 10.1016/j.artmed.2017.09.005. Epub 2017 Sep 22.

Abstract

OBJECTIVE

The use of artificial intelligence techniques to find out which Single Nucleotide Polymorphisms (SNPs) promote the development of a disease is one of the features of medical research, as such techniques may potentially aid early diagnosis and help in the prescription of preventive measures. In particular, the aim is to help physicians to identify the relevant SNPs related to Type 2 diabetes, and to build a decision-support tool for risk prediction.

METHODS

We use the Random Forest (RF) technique in order to search for the most important attributes (SNPs) related to diabetes, giving a weight (degree of importance), ranging between 0 and 1, to each attribute. Support Vector Machines and Logistic Regression have also been used since they are two other machine learning techniques that are well-established in the health community. Their performance has been compared to that achieved by RF. Furthermore, the relevance of the attributes obtained through the use of RF has then been used to perform predictions with k-Nearest Neighbour method weighting attributes in the similarity measure according to the relevance of the attributes with RF.

RESULTS

Testing is performed on a set of 677 subjects. RF is able to handle the complexity of features' interactions, overfitting, and unknown attribute values, providing the SNPs' relevance with an up to 0.89 area under the ROC curve in terms of risk prediction. RF outperforms all the other tested machine learning techniques in terms of prediction accuracy, and in terms of the stability of the estimated relevance of the attributes.

CONCLUSIONS

The Random Forest is a useful method for learning predictive models and the relevance of SNPs without any underlying assumption.

摘要

目的

利用人工智能技术找出哪些单核苷酸多态性(SNP)促进疾病的发展,是医学研究的特点之一,因为这些技术可能有助于早期诊断,并有助于制定预防措施。特别是,目的是帮助医生识别与 2 型糖尿病相关的相关 SNP,并构建一个用于风险预测的决策支持工具。

方法

我们使用随机森林(RF)技术来搜索与糖尿病最相关的最重要属性(SNP),并为每个属性分配一个权重(重要程度),范围为 0 到 1。支持向量机和逻辑回归也已被使用,因为它们是在健康领域中广泛使用的两种其他机器学习技术。比较了它们与 RF 实现的性能。此外,通过使用 RF 获得的属性的相关性用于使用 k-最近邻方法进行预测,根据与 RF 的属性相关性在相似度度量中加权属性。

结果

在一组 677 名受试者上进行了测试。RF 能够处理特征交互、过拟合和未知属性值的复杂性,在风险预测方面,RF 提供的 SNP 相关性的 ROC 曲线下面积高达 0.89。在预测准确性和属性估计相关性的稳定性方面,RF 优于所有其他经过测试的机器学习技术。

结论

随机森林是一种无需任何假设即可学习预测模型和 SNP 相关性的有用方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验