Suppr超能文献

利用大语言模型和贝叶斯推理,将基因型和表型数据用于群体规模的变异分类。

Harnessing genotype and phenotype data for population-scale variant classification using large language models and bayesian inference.

作者信息

Manders Toby R, Tan Christopher A, Kobayashi Yuya, Wahl Alexander, Araya Carlos, Colavin Alexandre, Facio Flavia M, Metz Hillery, Reuter Jason, Frésard Laure, Padigepati Samskruthi R, Stafford David A, Nussbaum Robert L, Nykamp Keith

机构信息

Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.

Invitae Corporation, 1400 16th Street, San Francisco, CA, 94103, USA.

出版信息

Hum Genet. 2025 Apr 23. doi: 10.1007/s00439-025-02743-z.

Abstract

Variants of Uncertain Significance (VUS) in genetic testing for hereditary diseases burden patients and clinicians, yet clinical data that could reduce VUS are underutilized due to a lack of scalable strategies. We assessed whether a machine learning approach using genotype and phenotype data could improve variant classification and reduce VUS. In this cohort study of a multi-step machine learning approach, patient data from test requisition forms were used to distinguish patients with molecular diagnoses from controls ("patient score"). A generative Bayesian model then used patient scores and variant classifications to infer variant pathogenicity ("variant score"). The study included 3.5 million patients referred for clinical genetic testing across various conditions. Primary outcomes were model- and gene-level discrimination, classification performance, probabilistic calibration, and concordance with orthogonal pathogenicity measures. Integration into a semi-quantitative classification framework was based on posterior pathogenicity probabilities matching PPV ≥ 0.99/NPV ≥ 0.95 thresholds, followed by expert review. We generated 1,334 clinical variant models (CVMs); 595 showed high performance in both machine learning steps (AUROCpatient ≥ 0.8 and AUROCvariant ≥ 0.8) on held-out data. High-confidence predictions from these CVMs provided evidence for 5,362 VUS observed in 200,174 patients, representing 23.4% of all VUS observations in these genes. In 17 frequently tested genes, CVMs reclassified over 1,000 unique VUS, reducing VUS report rates by 9-49% per condition. In conclusion, a scalable machine learning approach using underutilized clinical data improved variant classification and reduced VUS.

摘要

遗传性疾病基因检测中的意义未明变异(VUS)给患者和临床医生带来了负担,但由于缺乏可扩展策略,本可减少VUS的临床数据未得到充分利用。我们评估了使用基因型和表型数据的机器学习方法是否能改善变异分类并减少VUS。在这项关于多步骤机器学习方法的队列研究中,来自测试申请表的患者数据被用于区分分子诊断患者和对照(“患者评分”)。然后,一个生成式贝叶斯模型使用患者评分和变异分类来推断变异致病性(“变异评分”)。该研究纳入了350万名因各种病症接受临床基因检测的患者。主要结果包括模型和基因水平的区分、分类性能、概率校准以及与正交致病性测量的一致性。纳入半定量分类框架是基于后验致病性概率匹配PPV≥0.99/NPV≥0.95的阈值,随后进行专家评审。我们生成了1334个临床变异模型(CVM);595个在保留数据的两个机器学习步骤中均表现出高性能(AUROC患者≥0.8且AUROC变异≥0.8)。这些CVM的高置信度预测为在200174名患者中观察到的5362个VUS提供了证据,占这些基因中所有VUS观察结果的23.4%。在17个经常检测的基因中,CVM对1000多个独特的VUS进行了重新分类,每种病症的VUS报告率降低了9 - 49%。总之,使用未充分利用的临床数据的可扩展机器学习方法改善了变异分类并减少了VUS。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验