Meng Jing-Bi, An Zai-Jian, Jiang Chun-Shan
Central Laboratory, Yanbian University Hospital, Yanji, Jilin Province, China.
Department of Clinical Laboratory, Yanbian University Hospital, Yanji, Jilin Province, China.
PeerJ. 2025 Apr 9;13:e19248. doi: 10.7717/peerj.19248. eCollection 2025.
This study aimed to validate and optimize a machine learning algorithm for accurately predicting low-density lipoprotein cholesterol (LDL-C) levels, addressing limitations of traditional formulas, particularly in hypertriglyceridemia.
Various machine learning models-linear regression, K-nearest neighbors (KNN), decision tree, random forest, eXtreme Gradient Boosting (XGB), and multilayer perceptron (MLP) regressor-were compared to conventional formulas (Friedewald, Martin, and Sampson) using lipid profiles from 120,174 subjects (2020-2023). Predictive performance was evaluated using R-squared ( ), mean squared error (MSE), and Pearson correlation coefficient (PCC) against measured LDL-C values.
Machine learning models outperformed traditional methods, with Random Forest and XGB achieving the highest accuracy ( = 0.94, MSE = 89.25) on the internal dataset. Among the traditional formulas, the Sampson method performed best but showed reduced accuracy in high triglyceride (TG) groups (TG > 300 mg/dL). Machine learning models maintained high predictive power across all TG levels.
Machine learning models offer more accurate LDL-C estimates, especially in high TG contexts where traditional formulas are less reliable. These models could enhance cardiovascular risk assessment by providing more precise LDL-C estimates, potentially leading to more informed treatment decisions and improved patient outcomes.
本研究旨在验证和优化一种机器学习算法,以准确预测低密度脂蛋白胆固醇(LDL-C)水平,解决传统公式的局限性,尤其是在高甘油三酯血症方面。
使用来自120174名受试者(2020 - 2023年)的血脂谱,将各种机器学习模型——线性回归、K近邻(KNN)、决策树、随机森林、极端梯度提升(XGB)和多层感知器(MLP)回归器——与传统公式(Friedewald、Martin和Sampson)进行比较。使用决定系数(R²)、均方误差(MSE)和皮尔逊相关系数(PCC)来评估相对于实测LDL-C值的预测性能。
机器学习模型优于传统方法,随机森林和XGB在内部数据集上达到了最高准确率(R² = 0.94,MSE = 89.25)。在传统公式中,Sampson方法表现最佳,但在高甘油三酯(TG)组(TG > 300 mg/dL)中准确性降低。机器学习模型在所有TG水平上均保持高预测能力。
机器学习模型能提供更准确的LDL-C估计值,尤其是在传统公式不太可靠的高TG情况下。这些模型可通过提供更精确的LDL-C估计值来加强心血管风险评估,可能会带来更明智的治疗决策并改善患者预后。