Department of Primary Care and Public Health School of Public Health, Imperial College London London United Kingdom.
Department of Medicine, Faculty of Medicine Universidad de Sevilla Sevilla Spain.
J Am Heart Assoc. 2024 Jun 18;13(12):e034434. doi: 10.1161/JAHA.123.034434. Epub 2024 Jun 15.
Familial hypercholesterolemia (FH), while highly prevalent, is a significantly underdiagnosed monogenic disorder. Improved detection could reduce the large number of cardiovascular events attributable to poor case finding. We aimed to assess whether machine learning algorithms outperform clinical diagnostic criteria (signs, history, and biomarkers) and the recommended screening criteria in the United Kingdom in identifying individuals with FH-causing variants, presenting a scalable screening criteria for general populations.
Analysis included UK Biobank participants with whole exome sequencing, classifying them as having FH when (likely) pathogenic variants were detected in their , , or genes. Data were stratified into 3 data sets for (1) feature importance analysis; (2) deriving state-of-the-art statistical and machine learning models; (3) evaluating models' predictive performance against clinical diagnostic and screening criteria: Dutch Lipid Clinic Network, Simon Broome, Make Early Diagnosis to Prevent Early Death, and Familial Case Ascertainment Tool. One thousand and three of 454 710 participants were classified as having FH. A Stacking Ensemble model yielded the best predictive performance (sensitivity, 74.93%; precision, 0.61%; accuracy, 72.80%, area under the receiver operating characteristic curve, 79.12%) and outperformed clinical diagnostic criteria and the recommended screening criteria in identifying FH variant carriers within the validation data set (figures for Familial Case Ascertainment Tool, the best baseline model, were 69.55%, 0.44%, 65.43%, and 71.12%, respectively). Our model decreased the number needed to screen compared with the Familial Case Ascertainment Tool (164 versus 227).
Our machine learning-derived model provides a higher pretest probability of identifying individuals with a molecular diagnosis of FH compared with current approaches. This provides a promising, cost-effective scalable tool for implementation into electronic health records to prioritize potential FH cases for genetic confirmation.
家族性高胆固醇血症(FH)虽然患病率很高,但却是一种严重未被诊断的单基因疾病。提高检出率可以减少因病例检出不佳而导致的大量心血管事件。我们旨在评估机器学习算法是否优于临床诊断标准(体征、病史和生物标志物)和英国推荐的筛查标准,以识别携带 FH 致病变异的个体,为一般人群提供一种可扩展的筛查标准。
分析包括英国生物库中进行全外显子组测序的参与者,当在他们的 、 或 基因中检测到可能致病的变异时,将其归类为 FH。数据分为 3 个数据集,用于(1)特征重要性分析;(2)得出最先进的统计和机器学习模型;(3)评估模型对临床诊断和筛查标准的预测性能:荷兰血脂诊所网络、西蒙·布鲁姆、早期诊断以预防早逝和家族病例确定工具。在 454710 名参与者中,有 1030 名被归类为 FH。堆叠集成模型产生了最佳的预测性能(敏感性 74.93%,精确性 0.61%,准确性 72.80%,接受者操作特征曲线下面积 79.12%),并在验证数据集中优于临床诊断标准和推荐的筛查标准,以识别 FH 变异携带者(家族病例确定工具的最佳基线模型的结果分别为 69.55%、0.44%、65.43%和 71.12%)。与家族病例确定工具相比,我们的模型减少了筛查所需的人数(164 与 227)。
与目前的方法相比,我们的机器学习衍生模型提供了更高的识别具有 FH 分子诊断个体的先验概率。这为将潜在 FH 病例优先进行基因确认的电子健康记录提供了一种有前途、具有成本效益的可扩展工具。