• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于单核苷酸多态性数据使用XGBoost进行高血压风险预测的特征选择

Feature Selection for Hypertension Risk Prediction Using XGBoost on Single Nucleotide Polymorphism Data.

作者信息

Muflikhah Lailil, Fatyanosa Tirana Noor, Widodo Nashi, Perdana Rizal Setya, Ratnawati Hana

机构信息

Department of Informatics Engineering, Faculty of Computer Science, Brawijaya University, Malang, Indonesia.

Department of Biology, Faculty of Mathematics and Natural Sciences, Brawijaya University, Malang, Indonesia.

出版信息

Healthc Inform Res. 2025 Jan;31(1):16-22. doi: 10.4258/hir.2025.31.1.16. Epub 2025 Jan 31.

DOI:10.4258/hir.2025.31.1.16
PMID:39973033
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11854617/
Abstract

OBJECTIVES

Hypertension, commonly known as high blood pressure, is a prevalent and serious condition affecting a significant portion of the adult population globally. It is a chronic medical issue that, if left unaddressed, can lead to severe health complications, including kidney problems, heart disease, and stroke. This study aims to develop a feature selection model using the XGBoost algorithm to identify specific single nucleotide polymorphisms (SNPs) as biomarkers for detecting hypertension risk.

METHODS

We propose using the high dimensionality of genetic variations (i.e., SNPs) to build a classifier model for prediction. In this study, SNPs were used as markers for hypertension in patients. We utilized the OpenSNP dataset, which includes 19,697 SNPs from 2,052 samples. Extreme gradient boosting (XGBoost) is an ensemble machine learning method employed here for feature selection, which incrementally adjusts weights in a series of steps.

RESULTS

The experimental results identified 292 SNPs that exhibited high performance, with an F1-score of 98.55%, precision of 98.73%, recall of 98.38%, and overall accuracy of 98%. This study provides compelling evidence that the XGBoost feature selection method outperforms other representative feature selection methods, such as genetic algorithms, analysis of variance, chi-square, and principal component analysis, in predicting hypertension risk, demonstrating its effectiveness.

CONCLUSIONS

We developed a model for predicting hypertension using the SNPs dataset. The high dimensionality of SNP data was effectively managed to identify significant features as biomarkers using the XGBoost feature selection method. The results indicate high performance in predicting the risk of hypertension.

摘要

目的

高血压,俗称高血压,是一种普遍且严重的疾病,影响着全球很大一部分成年人口。它是一个慢性医学问题,如果不加以解决,可能会导致严重的健康并发症,包括肾脏问题、心脏病和中风。本研究旨在开发一种使用XGBoost算法的特征选择模型,以识别特定的单核苷酸多态性(SNP)作为检测高血压风险的生物标志物。

方法

我们建议利用遗传变异(即SNP)的高维性来构建一个预测分类模型。在本研究中,SNP被用作患者高血压的标志物。我们使用了OpenSNP数据集,其中包括来自2052个样本的19697个SNP。极端梯度提升(XGBoost)是一种在这里用于特征选择的集成机器学习方法,它在一系列步骤中逐步调整权重。

结果

实验结果确定了292个表现出高性能的SNP,F1分数为98.55%,精确率为98.73%,召回率为98.38%,总体准确率为98%。本研究提供了令人信服的证据,表明在预测高血压风险方面,XGBoost特征选择方法优于其他代表性特征选择方法,如遗传算法、方差分析、卡方检验和主成分分析,证明了其有效性。

结论

我们使用SNP数据集开发了一个预测高血压的模型。利用XGBoost特征选择方法有效地管理了SNP数据的高维性,以识别作为生物标志物的重要特征。结果表明在预测高血压风险方面具有高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/b2fa6c1848fc/hir-2025-31-1-16f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/98b70ae7c478/hir-2025-31-1-16f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/ff8ac0a49eb3/hir-2025-31-1-16f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/b2fa6c1848fc/hir-2025-31-1-16f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/98b70ae7c478/hir-2025-31-1-16f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/ff8ac0a49eb3/hir-2025-31-1-16f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0807/11854617/b2fa6c1848fc/hir-2025-31-1-16f3.jpg

相似文献

1
Feature Selection for Hypertension Risk Prediction Using XGBoost on Single Nucleotide Polymorphism Data.基于单核苷酸多态性数据使用XGBoost进行高血压风险预测的特征选择
Healthc Inform Res. 2025 Jan;31(1):16-22. doi: 10.4258/hir.2025.31.1.16. Epub 2025 Jan 31.
2
Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach.基于单核苷酸多态性的疟疾风险评分预测模型:一种机器学习方法。
BMC Bioinformatics. 2022 Aug 7;23(1):325. doi: 10.1186/s12859-022-04870-0.
3
Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
4
A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data.一种基于医学数据的高血压结局的机器学习预测方法。
Diagnostics (Basel). 2019 Nov 7;9(4):178. doi: 10.3390/diagnostics9040178.
5
Prediction of cardiovascular disease based on multiple feature selection and improved PSO-XGBoost model.基于多特征选择和改进的粒子群优化-极端梯度提升模型的心血管疾病预测
Sci Rep. 2025 Apr 11;15(1):12406. doi: 10.1038/s41598-025-96520-7.
6
Prediction of lung cancer risk in Chinese population with genetic-environment factor using extreme gradient boosting.利用极端梯度提升对中国人群进行遗传-环境因素相关肺癌风险预测。
Cancer Med. 2022 Dec;11(23):4469-4478. doi: 10.1002/cam4.4800. Epub 2022 May 2.
7
The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost .结构基因组变异与单核苷酸多态性在解释海洋硬骨鱼类数量性状生长中的相对作用
Genes (Basel). 2022 Jun 23;13(7):1129. doi: 10.3390/genes13071129.
8
Prediction model of atrial fibrillation recurrence after Cox-Maze IV procedure in patients with chronic valvular disease and atrial fibrillation based on machine learning algorithm.基于机器学习算法的慢性瓣膜病合并心房颤动患者 Cox-Maze IV 术后心房颤动复发预测模型。
Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2023 Jul 28;48(7):995-1007. doi: 10.11817/j.issn.1672-7347.2023.230018.
9
Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree.基于梯度提升树的单核苷酸多态性分析早期检测阿尔茨海默病。
Comput Biol Med. 2022 Jul;146:105622. doi: 10.1016/j.compbiomed.2022.105622. Epub 2022 May 24.
10
Predicting pathological response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer with two step feature selection and ensemble learning.利用两步特征选择和集成学习预测局部晚期直肠癌新辅助放化疗的病理反应
Sci Rep. 2025 Mar 22;15(1):9936. doi: 10.1038/s41598-025-94337-y.

本文引用的文献

1
Deep learning techniques for cancer classification using microarray gene expression data.使用微阵列基因表达数据进行癌症分类的深度学习技术。
Front Physiol. 2022 Sep 30;13:952709. doi: 10.3389/fphys.2022.952709. eCollection 2022.
2
Machine Learning for Hypertension Prediction: a Systematic Review.机器学习在高血压预测中的应用:系统评价。
Curr Hypertens Rep. 2022 Nov;24(11):523-533. doi: 10.1007/s11906-022-01212-6. Epub 2022 Jun 22.
3
Machine learning in clinical decision making.机器学习在临床决策中的应用。
Med. 2021 Jun 11;2(6):642-665. doi: 10.1016/j.medj.2021.04.006. Epub 2021 Apr 30.
4
A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification.三阶段包装器-过滤器特征选择框架用于疾病分类。
Sensors (Basel). 2021 Aug 18;21(16):5571. doi: 10.3390/s21165571.
5
Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants.全球高血压患病率趋势及 1990 至 2019 年治疗和控制进展情况:1040 万参与者、1201 项人群代表性研究的汇总分析
Lancet. 2021 Sep 11;398(10304):957-980. doi: 10.1016/S0140-6736(21)01330-1. Epub 2021 Aug 24.
6
Predicting hypertension using machine learning: Findings from Qatar Biobank Study.使用机器学习预测高血压:来自卡塔尔生物银行研究的结果。
PLoS One. 2020 Oct 16;15(10):e0240370. doi: 10.1371/journal.pone.0240370. eCollection 2020.
7
Delineation of a Human Mendelian Disorder of the DNA Demethylation Machinery: TET3 Deficiency.DNA 去甲基化机制中人类 Mendelian 疾病的描绘:TET3 缺乏。
Am J Hum Genet. 2020 Feb 6;106(2):234-245. doi: 10.1016/j.ajhg.2019.12.007. Epub 2020 Jan 9.
8
A Prediction Model of Essential Hypertension Based on Genetic and Environmental Risk Factors in Northern Han Chinese.基于北方汉族人群遗传和环境危险因素的原发性高血压预测模型。
Int J Med Sci. 2019 Jun 2;16(6):793-799. doi: 10.7150/ijms.33967. eCollection 2019.
9
Principal component analysis: a review and recent developments.主成分分析:综述与最新进展
Philos Trans A Math Phys Eng Sci. 2016 Apr 13;374(2065):20150202. doi: 10.1098/rsta.2015.0202.
10
The Role of Genetic Risk Score in Predicting the Risk of Hypertension in the Korean population: Korean Genome and Epidemiology Study.遗传风险评分在预测韩国人群高血压风险中的作用:韩国基因组与流行病学研究
PLoS One. 2015 Jun 25;10(6):e0131603. doi: 10.1371/journal.pone.0131603. eCollection 2015.