Ankara University Stem Cell Institute, Interdisciplinary Stem Cells and Regenerative Medicine, Ankara, Turkey.
Başkent University Faculty of Medicine, Department of Medical Biochemistry, Ankara, Turkey.
Lab Med. 2022 Mar 7;53(2):161-171. doi: 10.1093/labmed/lmab065.
Low-density lipoprotein cholesterol (LDL-C) can be estimated using the Friedewald and Martin-Hopkins formulas. We developed LDL-C prediction models using multiple machine learning methods and investigated the validity of the new models along with the former formulas.
Laboratory data (n = 59,415) on measured LDL-C, high-density lipoprotein cholesterol, triglycerides (TG), and total cholesterol were partitioned into training and test data sets. Linear regression, gradient-boosted trees, and artificial neural network (ANN) models were formed based on the training data. Paired-group comparisons were performed using a t-test and the Wilcoxon signed-rank test. We considered P values <.001 with an effect size >.2 to be statistically significant.
For TG ≥177 mg/dL, the Friedewald formula underestimated and the Martin-Hopkins formula overestimated the LDL-C (P <.001), which was more significant for LDL-C <70 mg/dL. The linear regression, gradient-boosted trees, and ANN models outperformed the aforementioned formulas for TG ≥177 mg/dL and LDL-C <70 mg/dL based on a comparison with a homogeneous assay (P >.001 vs. P <.001) and classification accuracy.
Linear regression, gradient-boosted trees, and ANN models offer more accurate alternatives to the aforementioned formulas, especially for TG 177 to 399 mg/dL and LDL-C <70 mg/dL.
可以使用 Friedewald 和 Martin-Hopkins 公式估算低密度脂蛋白胆固醇(LDL-C)。我们使用多种机器学习方法开发了 LDL-C 预测模型,并研究了新模型与旧公式的有效性。
将实验室数据(n=59415)中关于实测 LDL-C、高密度脂蛋白胆固醇、甘油三酯(TG)和总胆固醇的数据分为训练数据集和测试数据集。基于训练数据形成线性回归、梯度提升树和人工神经网络(ANN)模型。使用 t 检验和 Wilcoxon 符号秩检验进行配对组比较。我们认为 P 值<.001 且效应量>.2 具有统计学意义。
对于 TG≥177mg/dL,Friedewald 公式低估了 LDL-C,而 Martin-Hopkins 公式高估了 LDL-C(P<.001),对于 LDL-C<70mg/dL 更为显著。基于与均相测定法的比较,线性回归、梯度提升树和 ANN 模型在 TG≥177mg/dL 和 LDL-C<70mg/dL 时优于上述公式(P>.001 与 P<.001),并且分类准确性更高。
线性回归、梯度提升树和 ANN 模型为上述公式提供了更准确的替代方案,特别是对于 TG 为 177 至 399mg/dL 和 LDL-C<70mg/dL。