Dalio Institute of Cardiovascular Imaging, Weill Cornell Medicine, New York, New York, United States of America.
Department of Internal Medicine, Yale University School of Medicine, New Haven, Connecticut, United States of America.
PLoS One. 2020 Sep 30;15(9):e0239934. doi: 10.1371/journal.pone.0239934. eCollection 2020.
Low-density lipoprotein cholesterol (LDL-C) is a target for cardiovascular prevention. Contemporary equations for LDL-C estimation have limited accuracy in certain scenarios (high triglycerides [TG], very low LDL-C).
We derived a novel method for LDL-C estimation from the standard lipid profile using a machine learning (ML) approach utilizing random forests (the Weill Cornell model). We compared its correlation to direct LDL-C with the Friedewald and Martin-Hopkins equations for LDL-C estimation.
The study cohort comprised a convenience sample of standard lipid profile measurements (with the directly measured components of total cholesterol [TC], high-density lipoprotein cholesterol [HDL-C], and TG) as well as chemical-based direct LDL-C performed on the same day at the New York-Presbyterian Hospital/Weill Cornell Medicine (NYP-WCM). Subsequently, an ML algorithm was used to construct a model for LDL-C estimation. Results are reported on the held-out test set, with correlation coefficients and absolute residuals used to assess model performance.
Between 2005 and 2019, there were 17,500 lipid profiles performed on 10,936 unique individuals (4,456 females; 40.8%) aged 1 to 103. Correlation coefficients between estimated and measured LDL-C values were 0.982 for the Weill Cornell model, compared to 0.950 for Friedewald and 0.962 for the Martin-Hopkins method. The Weill Cornell model was consistently better across subgroups stratified by LDL-C and TG values, including TG >500 and LDL-C <70.
An ML model was found to have a better correlation with direct LDL-C than either the Friedewald formula or Martin-Hopkins equation, including in the setting of elevated TG and very low LDL-C.
低密度脂蛋白胆固醇(LDL-C)是心血管预防的靶点。目前的 LDL-C 估算方程在某些情况下(高甘油三酯[TG]、极低 LDL-C)准确性有限。
我们使用机器学习(ML)方法从标准血脂谱中利用随机森林(Weill Cornell 模型)开发了一种新的 LDL-C 估算方法。我们将其与直接 LDL-C 的相关性与 Friedewald 和 Martin-Hopkins 方程进行了比较。
研究队列由标准血脂谱测量的方便样本组成(包括总胆固醇[TC]、高密度脂蛋白胆固醇[HDL-C]和 TG 的直接测量成分)以及在纽约长老会医院/威尔康奈尔医学中心(NYP-WCM)同一天进行的基于化学的直接 LDL-C。随后,使用 ML 算法构建 LDL-C 估算模型。结果在保留的测试集上报告,使用相关系数和绝对残差来评估模型性能。
在 2005 年至 2019 年期间,在 10936 名不同个体(44.56%为女性)中进行了 17500 次血脂谱测量。估算的 LDL-C 值与实测 LDL-C 值之间的相关系数分别为 Weill Cornell 模型为 0.982,Friedewald 方程为 0.950,Martin-Hopkins 法为 0.962。Weill Cornell 模型在按 LDL-C 和 TG 值分层的亚组中始终表现更好,包括 TG>500 和 LDL-C<70。
与 Friedewald 公式或 Martin-Hopkins 方程相比,ML 模型与直接 LDL-C 的相关性更好,包括在 TG 升高和 LDL-C 极低的情况下。