Akyea Ralph K, Qureshi Nadeem, Kai Joe, Weng Stephen F
Primary Care Stratified Medicine, Division of Primary Care, University of Nottingham, Nottingham, UK.
NPJ Digit Med. 2020 Oct 30;3:142. doi: 10.1038/s41746-020-00349-5. eCollection 2020.
Familial hypercholesterolaemia (FH) is a common inherited disorder, causing lifelong elevated low-density lipoprotein cholesterol (LDL-C). Most individuals with FH remain undiagnosed, precluding opportunities to prevent premature heart disease and death. Some machine-learning approaches improve detection of FH in electronic health records, though clinical impact is under-explored. We assessed performance of an array of machine-learning approaches for enhancing detection of FH, and their clinical utility, within a large primary care population. A retrospective cohort study was done using routine primary care clinical records of 4,027,775 individuals from the United Kingdom with total cholesterol measured from 1 January 1999 to 25 June 2019. Predictive accuracy of five common machine-learning algorithms (logistic regression, random forest, gradient boosting machines, neural networks and ensemble learning) were assessed for detecting FH. Predictive accuracy was assessed by area under the receiver operating curves (AUC) and expected vs observed calibration slope; with clinical utility assessed by expected case-review workload and likelihood ratios. There were 7928 incident diagnoses of FH. In addition to known clinical features of FH (raised total cholesterol or LDL-C and family history of premature coronary heart disease), machine-learning (ML) algorithms identified features such as raised triglycerides which reduced the likelihood of FH. Apart from logistic regression (AUC, 0.81), all four other ML approaches had similarly high predictive accuracy (AUC > 0.89). Calibration slope ranged from 0.997 for gradient boosting machines to 1.857 for logistic regression. Among those screened, high probability cases requiring clinical review varied from 0.73% using ensemble learning to 10.16% using deep learning, but with positive predictive values of 15.5% and 2.8% respectively. Ensemble learning exhibited a dominant positive likelihood ratio (45.5) compared to all other ML models (7.0-14.4). Machine-learning models show similar high accuracy in detecting FH, offering opportunities to increase diagnosis. However, the clinical case-finding workload required for yield of cases will differ substantially between models.
家族性高胆固醇血症(FH)是一种常见的遗传性疾病,会导致低密度脂蛋白胆固醇(LDL-C)终生升高。大多数FH患者仍未被诊断出来,从而失去了预防过早心脏病和死亡的机会。一些机器学习方法可改善在电子健康记录中对FH的检测,不过其临床影响尚未得到充分探索。我们在一大群初级保健人群中评估了一系列用于增强FH检测的机器学习方法的性能及其临床效用。利用来自英国的4,027,775名个体的常规初级保健临床记录进行了一项回顾性队列研究,这些个体在1999年1月1日至2019年6月25日期间测量了总胆固醇。评估了五种常见机器学习算法(逻辑回归、随机森林、梯度提升机、神经网络和集成学习)检测FH的预测准确性。通过受试者工作特征曲线下面积(AUC)和预期与观察到的校准斜率评估预测准确性;通过预期病例审查工作量和似然比评估临床效用。有7928例FH的新发诊断。除了FH的已知临床特征(总胆固醇或LDL-C升高以及早发冠心病家族史)外,机器学习(ML)算法还识别出甘油三酯升高等特征,这些特征降低了FH的可能性。除逻辑回归(AUC,0.81)外,其他四种ML方法均具有相似的高预测准确性(AUC>0.89)。校准斜率范围从梯度提升机的0.997到逻辑回归的1.857。在接受筛查的人群中,需要临床审查的高概率病例从使用集成学习的0.73%到使用深度学习的10.16%不等,但阳性预测值分别为15.5%和2.8%。与所有其他ML模型(7.0 - 14.4)相比,集成学习表现出显著的阳性似然比(45.5)。机器学习模型在检测FH方面显示出相似的高准确性,为增加诊断提供了机会。然而,不同模型间为发现病例所需的临床病例查找工作量将有很大差异。
Ont Health Technol Assess Ser. 2007
Clin Orthop Relat Res. 2020-7
J Clin Med. 2025-7-14
High Blood Press Cardiovasc Prev. 2024-9
Front Public Health. 2024
Eur J Epidemiol. 2018-4-4
Genomics Proteomics Bioinformatics. 2018-3-6