机器学习方法用于非酒精性脂肪性肝炎易感性估计。

A machine-learning approach for nonalcoholic steatohepatitis susceptibility estimation.

机构信息

Department of Computer Engineering, Istanbul University Cerrahpaşa, 34320, Istanbul, Turkey.

Computer Programming, Vocational School, Nişantaşı University, 1453, Istanbul, Turkey.

出版信息

Indian J Gastroenterol. 2022 Oct;41(5):475-482. doi: 10.1007/s12664-022-01263-2. Epub 2022 Nov 11.

DOI:10.1007/s12664-022-01263-2

PMID:36367682

Abstract

BACKGROUND

Nonalcoholic steatohepatitis (NASH), a severe form of nonalcoholic fatty liver disease, can lead to advanced liver damage and has become an increasingly prominent health problem worldwide. Predictive models for early identification of high-risk individuals could help identify preventive and interventional measures. Traditional epidemiological models with limited predictive power are based on statistical analysis. In the current study, a novel machine-learning approach was developed for individual NASH susceptibility prediction using candidate single nucleotide polymorphisms (SNPs).

METHODS

A total of 245 NASH patients and 120 healthy individuals were included in the study. Single nucleotide polymorphism genotypes of candidate genes including two SNPs in the cytochrome P450 family 2 subfamily E member 1 (CYP2E1) gene (rs6413432, rs3813867), two SNPs in the glucokinase regulator (GCKR) gene (rs780094, rs1260326), rs738409 SNP in patatin-like phospholipase domain-containing 3 (PNPLA3), and gender parameters were used to develop models for identifying at-risk individuals. To predict the individual's susceptibility to NASH, nine different machine-learning models were constructed. These models involved two different feature selections including Chi-square, and support vector machine recursive feature elimination (SVM-RFE) and three classification algorithms including k-nearest neighbor (KNN), multi-layer perceptron (MLP), and random forest (RF). All nine machine-learning models were trained using 80% of both the NASH patients and the healthy controls data. The nine machine-learning models were then tested on 20% of both groups. The model's performance was compared for model accuracy, precision, sensitivity, and F measure.

RESULTS

Among all nine machine-learning models, the KNN classifier with all features as input showed the highest performance with 86% F measure and 79% accuracy.

CONCLUSIONS

Machine learning based on genomic variety may be applicable for estimating an individual's susceptibility for developing NASH among high-risk groups with a high degree of accuracy, precision, and sensitivity.

摘要

背景

非酒精性脂肪性肝炎（NASH）是一种严重的非酒精性脂肪肝疾病，可导致严重的肝损伤，已成为全球日益突出的健康问题。预测模型可以帮助识别高危个体，从而采取预防和干预措施。传统的预测能力有限的流行病学模型是基于统计分析的。本研究采用候选单核苷酸多态性（SNP），开发了一种新的机器学习方法来进行个体 NASH 易感性预测。

方法

共纳入 245 例 NASH 患者和 120 例健康对照者。采用候选基因单核苷酸多态性基因型，包括细胞色素 P450 家族 2 亚家族 E 成员 1（CYP2E1）基因的两个 SNP（rs6413432，rs3813867）、葡萄糖激酶调节因子（GCKR）基因的两个 SNP（rs780094，rs1260326）、patatin-like phospholipase domain-containing 3（PNPLA3）的 rs738409 多态性以及性别参数，建立识别高危个体的模型。为了预测个体患 NASH 的易感性，构建了 9 种不同的机器学习模型。这些模型涉及两种不同的特征选择，包括卡方检验，以及支持向量机递归特征消除（SVM-RFE）和三种分类算法，包括 k-最近邻（KNN）、多层感知机（MLP）和随机森林（RF）。所有 9 种机器学习模型均使用 80%的 NASH 患者和健康对照组数据进行训练。然后，使用 20%的两组数据对 9 种机器学习模型进行测试。比较了模型的准确性、精确性、敏感性和 F 度量，以评估模型的性能。