Suppr超能文献

评估惩罚和机器学习方法在韩国基因组与流行病学研究(KoGES)中对哮喘病的预测作用。

Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES).

机构信息

Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.

Department of Mathematical Data Science, College of Science and Convergence Technology, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.

出版信息

BMC Bioinformatics. 2024 Feb 2;25(1):56. doi: 10.1186/s12859-024-05677-x.

Abstract

BACKGROUND

Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES).

RESULTS

First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems.

CONCLUSIONS

Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.

摘要

背景

全基因组关联研究已成功鉴定出与人类疾病相关的遗传变异。基于惩罚和机器学习方法的各种统计方法最近已被提出用于疾病预测。在这项研究中,我们使用来自韩国基因组和流行病学研究(KoGES)的韩国芯片(KORV1.1)评估了几种此类方法预测哮喘的性能。

结果

首先,通过使用逻辑回归进行单变量测试,在调整了几个流行病学因素后,选择了单核苷酸多态性。接下来,我们评估了以下疾病预测方法:岭回归、最小绝对收缩和选择算子、弹性网络、平滑裁剪绝对偏差、支持向量机、随机森林、提升、袋装、朴素贝叶斯和 K-最近邻。最后,我们根据接收者操作特征曲线的曲线下面积、精度、召回率、F1 分数、科恩氏 Kappa、平衡准确性、错误率、马修斯相关系数和精度召回曲线下面积来比较它们的预测性能。此外,还使用了三种过采样算法来解决不平衡问题。

结论

我们的结果表明,惩罚方法在预测哮喘方面表现出比机器学习方法更好的预测性能。另一方面,在过采样研究中,随机森林和提升方法总体上表现出比惩罚方法更好的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a3a/10837879/f6e5c5bbd10f/12859_2024_5677_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验