Wang Hsin-Yao, Chang Shih-Cheng, Lin Wan-Ying, Chen Chun-Hsien, Chiang Szu-Hsien, Huang Kai-Yao, Chu Bo-Yu, Lu Jang-Jih, Lee Tzong-Yi
1 Department of Laboratory Medicine, Chang Gung Memorial Hospital , Taoyuan City, Taiwan .
9 Ph.D. Program in Biomedical Engineering, Chang Gung University , Taoyuan City, Taiwan .
J Comput Biol. 2018 Dec;25(12):1347-1360. doi: 10.1089/cmb.2018.0002. Epub 2018 Sep 8.
Obesity is a major risk factor for many metabolic diseases. To understand the genetic characteristics of obese individuals, single-nucleotide polymorphisms (SNPs) derived from next-generation sequencing (NGS) provide comprehensive insight into genome-wide genetic investigation. However, interpretation of these SNP data for clinical application is difficult given the high complexity of NGS data. Hence, in this study, obesity risk prediction models based on SNPs were designed using machine learning (ML) methods, namely support vector machine (SVM), k-nearest neighbor, and decision tree (DT). This investigation obtained clinicopathological features, including 130 SNPs, sex, and age, from 139 eligible individuals. Various feature selection methods, such as stepwise multivariate linear regression (MLR), DT, and genetic algorithms, were applied to select informative features for generating obesity prediction models. Multivariate logistic regression was used to evaluate the importance of the selected features. The models trained from various features evaluated their predictive performances based on fivefold cross-validation. Three measures, namely accuracy, sensitivity, and specificity, were used to examine and compare the predictive power among various models. To design obesity prediction models using ML methods, nine SNPs, including rs10501087, rs17700144, rs2287019, rs534870, rs660339, rs7081678, rs718314, rs9816226, and rs984222, were selected based on stepwise MLR. In evaluation of model performance, the SVM model significantly outperformed other classifiers based on the same training features. The SVM model exhibits 70.77% accuracy, 80.09% sensitivity, and 63.02% specificity. This investigation has demonstrated that the selected SNPs were effective in the detection of obesity risk. Additionally, the ML-based method provides a feasible mean for conducting preliminary analyses of genetic characteristics of obesity.
肥胖是许多代谢性疾病的主要风险因素。为了解肥胖个体的遗传特征,来自下一代测序(NGS)的单核苷酸多态性(SNP)为全基因组遗传研究提供了全面的见解。然而,鉴于NGS数据的高度复杂性,将这些SNP数据用于临床应用的解读具有挑战性。因此,在本研究中,基于SNP的肥胖风险预测模型采用机器学习(ML)方法设计,即支持向量机(SVM)、k近邻算法和决策树(DT)。本研究从139名符合条件的个体中获取了临床病理特征,包括130个SNP、性别和年龄。应用了各种特征选择方法,如逐步多元线性回归(MLR)、DT和遗传算法,以选择用于生成肥胖预测模型的信息性特征。多元逻辑回归用于评估所选特征的重要性。从各种特征训练的模型基于五折交叉验证评估其预测性能。使用准确性、敏感性和特异性这三个指标来检验和比较各种模型之间的预测能力。为了使用ML方法设计肥胖预测模型,基于逐步MLR选择了9个SNP,包括rs10501087、rs17700144、rs2287019、rs534870、rs660339、rs7081678、rs718314、rs9816226和rs984222。在模型性能评估中,基于相同训练特征的SVM模型显著优于其他分类器。SVM模型的准确率为70.77%,敏感性为80.09%,特异性为63.02%。本研究表明,所选的SNP在检测肥胖风险方面是有效的。此外,基于ML的方法为进行肥胖遗传特征的初步分析提供了一种可行的手段。