Suppr超能文献

基于单核苷酸多态性数据使用XGBoost进行高血压风险预测的特征选择

Feature Selection for Hypertension Risk Prediction Using XGBoost on Single Nucleotide Polymorphism Data.

作者信息

Muflikhah Lailil, Fatyanosa Tirana Noor, Widodo Nashi, Perdana Rizal Setya, Ratnawati Hana

机构信息

Department of Informatics Engineering, Faculty of Computer Science, Brawijaya University, Malang, Indonesia.

Department of Biology, Faculty of Mathematics and Natural Sciences, Brawijaya University, Malang, Indonesia.

出版信息

Healthc Inform Res. 2025 Jan;31(1):16-22. doi: 10.4258/hir.2025.31.1.16. Epub 2025 Jan 31.

Abstract

OBJECTIVES

Hypertension, commonly known as high blood pressure, is a prevalent and serious condition affecting a significant portion of the adult population globally. It is a chronic medical issue that, if left unaddressed, can lead to severe health complications, including kidney problems, heart disease, and stroke. This study aims to develop a feature selection model using the XGBoost algorithm to identify specific single nucleotide polymorphisms (SNPs) as biomarkers for detecting hypertension risk.

METHODS

We propose using the high dimensionality of genetic variations (i.e., SNPs) to build a classifier model for prediction. In this study, SNPs were used as markers for hypertension in patients. We utilized the OpenSNP dataset, which includes 19,697 SNPs from 2,052 samples. Extreme gradient boosting (XGBoost) is an ensemble machine learning method employed here for feature selection, which incrementally adjusts weights in a series of steps.

RESULTS

The experimental results identified 292 SNPs that exhibited high performance, with an F1-score of 98.55%, precision of 98.73%, recall of 98.38%, and overall accuracy of 98%. This study provides compelling evidence that the XGBoost feature selection method outperforms other representative feature selection methods, such as genetic algorithms, analysis of variance, chi-square, and principal component analysis, in predicting hypertension risk, demonstrating its effectiveness.

CONCLUSIONS

We developed a model for predicting hypertension using the SNPs dataset. The high dimensionality of SNP data was effectively managed to identify significant features as biomarkers using the XGBoost feature selection method. The results indicate high performance in predicting the risk of hypertension.

摘要

目的

高血压,俗称高血压,是一种普遍且严重的疾病,影响着全球很大一部分成年人口。它是一个慢性医学问题,如果不加以解决,可能会导致严重的健康并发症,包括肾脏问题、心脏病和中风。本研究旨在开发一种使用XGBoost算法的特征选择模型,以识别特定的单核苷酸多态性(SNP)作为检测高血压风险的生物标志物。

方法

我们建议利用遗传变异(即SNP)的高维性来构建一个预测分类模型。在本研究中,SNP被用作患者高血压的标志物。我们使用了OpenSNP数据集,其中包括来自2052个样本的19697个SNP。极端梯度提升(XGBoost)是一种在这里用于特征选择的集成机器学习方法,它在一系列步骤中逐步调整权重。

结果

实验结果确定了292个表现出高性能的SNP,F1分数为98.55%,精确率为98.73%,召回率为98.38%,总体准确率为98%。本研究提供了令人信服的证据,表明在预测高血压风险方面,XGBoost特征选择方法优于其他代表性特征选择方法,如遗传算法、方差分析、卡方检验和主成分分析,证明了其有效性。

结论

我们使用SNP数据集开发了一个预测高血压的模型。利用XGBoost特征选择方法有效地管理了SNP数据的高维性,以识别作为生物标志物的重要特征。结果表明在预测高血压风险方面具有高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/98b70ae7c478/hir-2025-31-1-16f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验