基于机器学习的肥胖风险评估方法：利用来自下一代测序的单核苷酸多态性

Machine Learning-Based Method for Obesity Risk Evaluation Using Single-Nucleotide Polymorphisms Derived from Next-Generation Sequencing.

作者信息

Wang Hsin-Yao, Chang Shih-Cheng, Lin Wan-Ying, Chen Chun-Hsien, Chiang Szu-Hsien, Huang Kai-Yao, Chu Bo-Yu, Lu Jang-Jih, Lee Tzong-Yi

机构信息

1 Department of Laboratory Medicine, Chang Gung Memorial Hospital , Taoyuan City, Taiwan .

9 Ph.D. Program in Biomedical Engineering, Chang Gung University , Taoyuan City, Taiwan .

出版信息

J Comput Biol. 2018 Dec;25(12):1347-1360. doi: 10.1089/cmb.2018.0002. Epub 2018 Sep 8.

DOI:10.1089/cmb.2018.0002

PMID:30204480

Abstract

Obesity is a major risk factor for many metabolic diseases. To understand the genetic characteristics of obese individuals, single-nucleotide polymorphisms (SNPs) derived from next-generation sequencing (NGS) provide comprehensive insight into genome-wide genetic investigation. However, interpretation of these SNP data for clinical application is difficult given the high complexity of NGS data. Hence, in this study, obesity risk prediction models based on SNPs were designed using machine learning (ML) methods, namely support vector machine (SVM), k-nearest neighbor, and decision tree (DT). This investigation obtained clinicopathological features, including 130 SNPs, sex, and age, from 139 eligible individuals. Various feature selection methods, such as stepwise multivariate linear regression (MLR), DT, and genetic algorithms, were applied to select informative features for generating obesity prediction models. Multivariate logistic regression was used to evaluate the importance of the selected features. The models trained from various features evaluated their predictive performances based on fivefold cross-validation. Three measures, namely accuracy, sensitivity, and specificity, were used to examine and compare the predictive power among various models. To design obesity prediction models using ML methods, nine SNPs, including rs10501087, rs17700144, rs2287019, rs534870, rs660339, rs7081678, rs718314, rs9816226, and rs984222, were selected based on stepwise MLR. In evaluation of model performance, the SVM model significantly outperformed other classifiers based on the same training features. The SVM model exhibits 70.77% accuracy, 80.09% sensitivity, and 63.02% specificity. This investigation has demonstrated that the selected SNPs were effective in the detection of obesity risk. Additionally, the ML-based method provides a feasible mean for conducting preliminary analyses of genetic characteristics of obesity.

摘要

肥胖是许多代谢性疾病的主要风险因素。为了解肥胖个体的遗传特征，来自下一代测序（NGS）的单核苷酸多态性（SNP）为全基因组遗传研究提供了全面的见解。然而，鉴于NGS数据的高度复杂性，将这些SNP数据用于临床应用的解读具有挑战性。因此，在本研究中，基于SNP的肥胖风险预测模型采用机器学习（ML）方法设计，即支持向量机（SVM）、k近邻算法和决策树（DT）。本研究从139名符合条件的个体中获取了临床病理特征，包括130个SNP、性别和年龄。应用了各种特征选择方法，如逐步多元线性回归（MLR）、DT和遗传算法，以选择用于生成肥胖预测模型的信息性特征。多元逻辑回归用于评估所选特征的重要性。从各种特征训练的模型基于五折交叉验证评估其预测性能。使用准确性、敏感性和特异性这三个指标来检验和比较各种模型之间的预测能力。为了使用ML方法设计肥胖预测模型，基于逐步MLR选择了9个SNP，包括rs10501087、rs17700144、rs2287019、rs534870、rs660339、rs7081678、rs718314、rs9816226和rs984222。在模型性能评估中，基于相同训练特征的SVM模型显著优于其他分类器。SVM模型的准确率为70.77%，敏感性为80.09%，特异性为63.02%。本研究表明，所选的SNP在检测肥胖风险方面是有效的。此外，基于ML的方法为进行肥胖遗传特征的初步分析提供了一种可行的手段。

相似文献

Machine Learning-Based Method for Obesity Risk Evaluation Using Single-Nucleotide Polymorphisms Derived from Next-Generation Sequencing.

J Comput Biol. 2018 Dec;25(12):1347-1360. doi: 10.1089/cmb.2018.0002. Epub 2018 Sep 8.

Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.

A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data.

Bioinformatics. 2013 Jun 1;29(11):1361-6. doi: 10.1093/bioinformatics/btt172. Epub 2013 Apr 24.

Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction.

Artif Intell Med. 2018 Apr;85:43-49. doi: 10.1016/j.artmed.2017.09.005. Epub 2017 Sep 22.

Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture.

Genet Sel Evol. 2017 Jan 16;49(1):8. doi: 10.1186/s12711-016-0277-y.

Machine learning random forest for predicting oncosomatic variant NGS analysis.

Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.

Next generation sequencing of SNPs using the HID-Ion AmpliSeq™ Identity Panel on the Ion Torrent PGM™ platform.

Forensic Sci Int Genet. 2016 Nov;25:73-84. doi: 10.1016/j.fsigen.2016.07.021. Epub 2016 Jul 29.

Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance.

BMC Res Notes. 2014 Oct 22;7:747. doi: 10.1186/1756-0500-7-747.

Mutation probability of cytochrome P450 based on a genetic algorithm and support vector machine.

Biotechnol J. 2011 Nov;6(11):1367-76. doi: 10.1002/biot.201000450. Epub 2011 Jul 1.

Derivation and validation of different machine-learning models in mortality prediction of trauma in motorcycle riders: a cross-sectional retrospective study in southern Taiwan.

BMJ Open. 2018 Jan 5;8(1):e018252. doi: 10.1136/bmjopen-2017-018252.

引用本文的文献

Understanding food addiction in obesity: a genetic perspective.

J Eat Disord. 2025 Aug 28;13(1):191. doi: 10.1186/s40337-025-01387-8.

Application and Analysis of Random Forest and Support Vector Classification in Risk Prediction of Childhood Obesity and Hyperuricemia.

Diabetes Metab Syndr Obes. 2025 Jul 7;18:2221-2233. doi: 10.2147/DMSO.S519284. eCollection 2025.

Harnessing Artificial Intelligence in Obesity Research and Management: A Comprehensive Review.

Diagnostics (Basel). 2025 Feb 6;15(3):396. doi: 10.3390/diagnostics15030396.

Deep learning captures the effect of epistasis in multifactorial diseases.

Front Med (Lausanne). 2025 Jan 7;11:1479717. doi: 10.3389/fmed.2024.1479717. eCollection 2024.

Integrating Artificial Intelligence for Advancing Multiple-Cancer Early Detection via Serum Biomarkers: A Narrative Review.

Cancers (Basel). 2024 Feb 21;16(5):862. doi: 10.3390/cancers16050862.

Association between adiposity and facial aging: results from a Mendelian randomization study.

Eur J Med Res. 2023 Sep 15;28(1):350. doi: 10.1186/s40001-023-01236-x.

Risk Stratification for Herpes Simplex Virus Pneumonia Using Elastic Net Penalized Cox Proportional Hazard Algorithm with Enhanced Explainability.

J Clin Med. 2023 Jul 5;12(13):4489. doi: 10.3390/jcm12134489.

Predicting Overweight and Obesity Status Among Malaysian Working Adults With Machine Learning or Logistic Regression: Retrospective Comparison Study.

JMIR Form Res. 2022 Dec 7;6(12):e40404. doi: 10.2196/40404.

Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies.

J Med Internet Res. 2022 Dec 7;24(12):e40589. doi: 10.2196/40589.

Single-nucleotide Polymorphisms in Medical Nutritional Weight Loss: Challenges and Future Directions.

J Transl Int Med. 2022 Apr 9;10(1):1-4. doi: 10.2478/jtim-2022-0002. eCollection 2022 Mar.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于机器学习的肥胖风险评估方法：利用来自下一代测序的单核苷酸多态性

Machine Learning-Based Method for Obesity Risk Evaluation Using Single-Nucleotide Polymorphisms Derived from Next-Generation Sequencing.

作者信息

Wang Hsin-Yao, Chang Shih-Cheng, Lin Wan-Ying, Chen Chun-Hsien, Chiang Szu-Hsien, Huang Kai-Yao, Chu Bo-Yu, Lu Jang-Jih, Lee Tzong-Yi

机构信息

1 Department of Laboratory Medicine, Chang Gung Memorial Hospital , Taoyuan City, Taiwan .

9 Ph.D. Program in Biomedical Engineering, Chang Gung University , Taoyuan City, Taiwan .

出版信息

J Comput Biol. 2018 Dec;25(12):1347-1360. doi: 10.1089/cmb.2018.0002. Epub 2018 Sep 8.

DOI:10.1089/cmb.2018.0002

PMID:30204480

Abstract

摘要

基于机器学习的肥胖风险评估方法：利用来自下一代测序的单核苷酸多态性

Machine Learning-Based Method for Obesity Risk Evaluation Using Single-Nucleotide Polymorphisms Derived from Next-Generation Sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于机器学习的肥胖风险评估方法：利用来自下一代测序的单核苷酸多态性

Machine Learning-Based Method for Obesity Risk Evaluation Using Single-Nucleotide Polymorphisms Derived from Next-Generation Sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献