Suppr超能文献

基于机器学习的代谢相关脂肪性肝病风险评估框架的开发与验证。

Development and validation of a machine learning-based framework for assessing metabolic-associated fatty liver disease risk.

机构信息

Zhongshan School of Medicine, Sun Yat-sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080, Guangdong, China.

School of Computer Science, China University of Geosciences, Wuhan, Beihe, 430074, China.

出版信息

BMC Public Health. 2024 Sep 18;24(1):2545. doi: 10.1186/s12889-024-19882-z.

Abstract

BACKGROUND

The existing predictive models for metabolic-associated fatty liver disease (MAFLD) possess certain limitations that render them unsuitable for extensive population-wide screening. This study is founded upon population health examination data and employs a comparison of eight distinct machine learning (ML) algorithms to construct the optimal screening model for identifying high-risk individuals with MAFLD in China.

METHODS

We collected physical examination data from 5,171,392 adults residing in the northwestern region of China, during the year 2021. Feature selection was conducted through the utilization of the Least Absolute Shrinkage and Selection Operator (LASSO) regression. Additionally, class balancing parameters were incorporated into the models, accompanied by hyperparameter tuning, to effectively address the challenges posed by imbalanced datasets. This study encompassed the development of both tree-based ML models (including Classification and Regression Trees, Random Forest, Adaptive Boosting, Light Gradient Boosting Machine, Extreme Gradient Boosting, and Categorical Boosting) and alternative ML models (specifically, k-Nearest Neighbors and Artificial Neural Network) for the purpose of identifying individuals with MAFLD. Furthermore, we visualized the importance scores of each feature on the selected model.

RESULTS

The average age (standard deviation) of the 5,171,392 participants was 51.12 (15.00) years, with 52.47% of the participants being females. MAFLD was diagnosed by specialized physicians. 20 variables were finally included for analyses after LASSO regression model. Following ten rounds of cross-validation and parameter optimization for each algorithm, the CatBoost algorithm exhibited the best performance, achieving an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.862. The ranking of feature importance indicates that age, BMI, triglyceride, fasting plasma glucose, waist circumference, occupation, high density lipoprotein cholesterol, low density lipoprotein cholesterol, total cholesterol, systolic blood pressure, diastolic blood pressure, ethnicity and cardiovascular diseases are the top 13 crucial factors for MAFLD screening.

CONCLUSION

This study utilized a large-scale, multi-ethnic physical examination data from the northwestern region of China to establish a more accurate and effective MAFLD risk screening model, offering a new perspective for the prediction and prevention of MAFLD.

摘要

背景

现有的代谢相关脂肪性肝病(MAFLD)预测模型存在一定局限性,不适合广泛的人群筛查。本研究基于人群健康检查数据,比较了 8 种不同的机器学习(ML)算法,构建了中国识别 MAFLD 高危个体的最佳筛查模型。

方法

我们收集了 2021 年居住在中国西北地区的 5171392 名成年人的体检数据。通过使用最小绝对收缩和选择算子(LASSO)回归进行特征选择。此外,模型中还纳入了类别平衡参数,并进行了超参数调整,以有效解决不平衡数据集带来的挑战。本研究包括基于树的 ML 模型(包括分类和回归树、随机森林、自适应增强、轻梯度提升机、极端梯度提升机和分类提升机)和替代 ML 模型(特别是 K-最近邻和人工神经网络)的开发,用于识别 MAFLD 患者。此外,我们还可视化了所选模型中每个特征的重要性得分。

结果

5171392 名参与者的平均年龄(标准差)为 51.12(15.00)岁,女性占 52.47%。MAFLD 由专科医生诊断。经过 LASSO 回归模型后,最终纳入 20 个变量进行分析。在对每个算法进行十轮交叉验证和参数优化后,CatBoost 算法表现最佳,获得了 0.862 的接收器工作特征曲线下面积(AUC)。特征重要性排序表明,年龄、BMI、甘油三酯、空腹血糖、腰围、职业、高密度脂蛋白胆固醇、低密度脂蛋白胆固醇、总胆固醇、收缩压、舒张压、民族和心血管疾病是 MAFLD 筛查的前 13 个关键因素。

结论

本研究利用中国西北地区大规模多民族体检数据建立了更准确有效的 MAFLD 风险筛查模型,为 MAFLD 的预测和预防提供了新视角。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e431/11412026/800fe868a6ea/12889_2024_19882_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验