• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于单核苷酸多态性的疟疾风险评分预测模型:一种机器学习方法。

Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach.

机构信息

School of Information Technology, Monash University Malaysia, Subang Jaya, Selangor, Malaysia.

出版信息

BMC Bioinformatics. 2022 Aug 7;23(1):325. doi: 10.1186/s12859-022-04870-0.

DOI:10.1186/s12859-022-04870-0
PMID:35934714
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9358850/
Abstract

BACKGROUND

The malaria risk prediction is currently limited to using advanced statistical methods, such as time series and cluster analysis on epidemiological data. Nevertheless, machine learning models have been explored to study the complexity of malaria through blood smear images and environmental data. However, to the best of our knowledge, no study analyses the contribution of Single Nucleotide Polymorphisms (SNPs) to malaria using a machine learning model. More specifically, this study aims to quantify an individual's susceptibility to the development of malaria by using risk scores obtained from the cumulative effects of SNPs, known as weighted genetic risk scores (wGRS).

RESULTS

We proposed an SNP-based feature extraction algorithm that incorporates the susceptibility information of an individual to malaria to generate the feature set. However, it can become computationally expensive for a machine learning model to learn from many SNPs. Therefore, we reduced the feature set by employing the Logistic Regression and Recursive Feature Elimination (LR-RFE) method to select SNPs that improve the efficacy of our model. Next, we calculated the wGRS of the selected feature set, which is used as the model's target variables. Moreover, to compare the performance of the wGRS-only model, we calculated and evaluated the combination of wGRS with genotype frequency (wGRS + GF). Finally, Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and Ridge regression algorithms are utilized to establish the machine learning models for malaria risk prediction.

CONCLUSIONS

Our proposed approach identified SNP rs334 as the most contributing feature with an importance score of 6.224 compared to the baseline, with an importance score of 1.1314. This is an important result as prior studies have proven that rs334 is a major genetic risk factor for malaria. The analysis and comparison of the three machine learning models demonstrated that LightGBM achieves the highest model performance with a Mean Absolute Error (MAE) score of 0.0373. Furthermore, based on wGRS + GF, all models performed significantly better than wGRS alone, in which LightGBM obtained the best performance (0.0033 MAE score).

摘要

背景

目前疟疾风险预测仅限于使用高级统计方法,例如对流行病学数据进行时间序列和聚类分析。然而,已经探索了机器学习模型通过血涂片图像和环境数据来研究疟疾的复杂性。然而,据我们所知,没有研究使用机器学习模型分析单核苷酸多态性(SNP)对疟疾的贡献。更具体地说,本研究旨在通过使用累积 SNP 效应获得的风险评分(称为加权遗传风险评分(wGRS))来量化个体对疟疾发展的易感性。

结果

我们提出了一种基于 SNP 的特征提取算法,该算法结合了个体对疟疾的易感性信息来生成特征集。然而,机器学习模型从许多 SNP 中学习可能会变得计算成本很高。因此,我们使用逻辑回归和递归特征消除(LR-RFE)方法来减少特征集,以选择可以提高模型效果的 SNP。接下来,我们计算所选特征集的 wGRS,该值用作模型的目标变量。此外,为了比较 wGRS 模型的性能,我们计算并评估了 wGRS 与基因型频率(wGRS+GF)的组合。最后,使用 Light Gradient Boosting Machine(LightGBM)、eXtreme Gradient Boosting(XGBoost)和 Ridge 回归算法来建立疟疾风险预测的机器学习模型。

结论

我们提出的方法确定 SNP rs334 是最重要的特征,与基线相比,重要性得分 6.224,而基线的重要性得分 1.1314。这是一个重要的结果,因为先前的研究已经证明 rs334 是疟疾的主要遗传风险因素。对三种机器学习模型的分析和比较表明,LightGBM 实现了最高的模型性能,平均绝对误差(MAE)评分为 0.0373。此外,基于 wGRS+GF,所有模型的性能均明显优于仅基于 wGRS 的模型,其中 LightGBM 的性能最佳(MAE 评分为 0.0033)。

相似文献

1
Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach.基于单核苷酸多态性的疟疾风险评分预测模型:一种机器学习方法。
BMC Bioinformatics. 2022 Aug 7;23(1):325. doi: 10.1186/s12859-022-04870-0.
2
Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
3
Development and validation of a machine learning-based predictive model for assessing the 90-day prognostic outcome of patients with spontaneous intracerebral hemorrhage.基于机器学习的预测模型评估自发性脑出血患者 90 天预后结局的开发与验证。
J Transl Med. 2024 Mar 4;22(1):236. doi: 10.1186/s12967-024-04896-3.
4
A machine-learning approach for nonalcoholic steatohepatitis susceptibility estimation.机器学习方法用于非酒精性脂肪性肝炎易感性估计。
Indian J Gastroenterol. 2022 Oct;41(5):475-482. doi: 10.1007/s12664-022-01263-2. Epub 2022 Nov 11.
5
Machine learning algorithms for diabetic kidney disease risk predictive model of Chinese patients with type 2 diabetes mellitus.用于中国2型糖尿病患者糖尿病肾病风险预测模型的机器学习算法
Ren Fail. 2025 Dec;47(1):2486558. doi: 10.1080/0886022X.2025.2486558. Epub 2025 Apr 7.
6
Screening the Best Risk Model and Susceptibility SNPs for Chronic Obstructive Pulmonary Disease (COPD) Based on Machine Learning Algorithms.基于机器学习算法筛选慢性阻塞性肺疾病(COPD)最佳风险模型和易感性单核苷酸多态性(SNP)。
Int J Chron Obstruct Pulmon Dis. 2024 Nov 5;19:2397-2414. doi: 10.2147/COPD.S478634. eCollection 2024.
7
Predicting postoperative neurological outcomes of degenerative cervical myelopathy based on machine learning.基于机器学习预测退行性颈椎脊髓病的术后神经学结果
Front Bioeng Biotechnol. 2025 Mar 4;13:1529545. doi: 10.3389/fbioe.2025.1529545. eCollection 2025.
8
[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].基于监督机器学习算法构建脓毒症休克患者死亡风险预测模型
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832.
9
Interpretable machine learning for allergic rhinitis prediction among preschool children in Urumqi, China.中国乌鲁木齐学龄前儿童变应性鼻炎预测的可解释机器学习。
Sci Rep. 2024 Sep 27;14(1):22281. doi: 10.1038/s41598-024-73733-w.
10
Machine learning approach to single nucleotide polymorphism-based asthma prediction.基于单核苷酸多态性的哮喘预测的机器学习方法。
PLoS One. 2019 Dec 4;14(12):e0225574. doi: 10.1371/journal.pone.0225574. eCollection 2019.

引用本文的文献

1
Genetic Artificial Intelligence in Gastrointestinal Disease: A Systematic Review.胃肠道疾病中的遗传人工智能:系统评价
Diagnostics (Basel). 2025 Sep 2;15(17):2227. doi: 10.3390/diagnostics15172227.

本文引用的文献

1
Machine learning suggests polygenic risk for cognitive dysfunction in amyotrophic lateral sclerosis.机器学习提示肌萎缩侧索硬化症认知功能障碍的多基因风险。
EMBO Mol Med. 2021 Jan 11;13(1):e12595. doi: 10.15252/emmm.202012595. Epub 2020 Dec 3.
2
Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population.用于预测中国人群慢性阻塞性肺疾病的机器学习工具的比较与发展
J Transl Med. 2020 Mar 31;18(1):146. doi: 10.1186/s12967-020-02312-0.
3
Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania.
利用来自非洲、亚洲和大洋洲的 17000 个人的全基因组数据洞察疟疾易感性。
Nat Commun. 2019 Dec 16;10(1):5732. doi: 10.1038/s41467-019-13480-z.
4
Machine learning approach to single nucleotide polymorphism-based asthma prediction.基于单核苷酸多态性的哮喘预测的机器学习方法。
PLoS One. 2019 Dec 4;14(12):e0225574. doi: 10.1371/journal.pone.0225574. eCollection 2019.
5
Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data.使用全基因组基因分型数据对克罗恩病患者进行分类的机器学习方法的比较性能。
Sci Rep. 2019 Jul 17;9(1):10351. doi: 10.1038/s41598-019-46649-z.
6
Development and Validation of Machine Learning Models in Prediction of Remission in Patients With Moderate to Severe Crohn Disease.机器学习模型在预测中重度克罗恩病患者缓解中的开发和验证。
JAMA Netw Open. 2019 May 3;2(5):e193721. doi: 10.1001/jamanetworkopen.2019.3721.
7
Machine Learning Models for Genetic Risk Assessment of Infants with Non-syndromic Orofacial Cleft.机器学习模型在非综合征型口面裂婴儿遗传风险评估中的应用。
Genomics Proteomics Bioinformatics. 2018 Oct;16(5):354-364. doi: 10.1016/j.gpb.2018.07.005. Epub 2018 Dec 19.
8
Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls.机器学习鉴定出导致乳腺癌风险的相互作用遗传变异:芬兰病例对照研究。
Sci Rep. 2018 Sep 3;8(1):13149. doi: 10.1038/s41598-018-31573-5.
9
A One-Penny Imputed Genome from Next-Generation Reference Panels.基于新一代参考面板的单分钱估算基因组。
Am J Hum Genet. 2018 Sep 6;103(3):338-348. doi: 10.1016/j.ajhg.2018.07.015. Epub 2018 Aug 9.
10
Human candidate gene polymorphisms and risk of severe malaria in children in Kilifi, Kenya: a case-control association study.肯尼亚基利菲儿童人类候选基因多态性与重症疟疾风险:一项病例对照关联研究。
Lancet Haematol. 2018 Aug;5(8):e333-e345. doi: 10.1016/S2352-3026(18)30107-8. Epub 2018 Jul 20.