• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

LEAP:利用机器学习在临床环境中支持变异分类。

LEAP: Using machine learning to support variant classification in a clinical setting.

机构信息

Data Science, Color Genomics, Burlingame, California.

Scientific Affairs, Color Genomics, Burlingame, California.

出版信息

Hum Mutat. 2020 Jun;41(6):1079-1090. doi: 10.1002/humu.24011. Epub 2020 Apr 1.

DOI:10.1002/humu.24011
PMID:32176384
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7317941/
Abstract

Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present Learning from Evidence to Assess Pathogenicity (LEAP), a machine learning model that utilizes a variety of feature categories to classify variants, and achieves high performance in multiple genes and different health conditions. Feature categories include functional predictions, splice predictions, population frequencies, conservation scores, protein domain data, and clinical observation data such as personal and family history and covariant information. L2-regularized logistic regression and random forest classification models were trained on missense variants detected and classified during the course of routine clinical testing at Color Genomics (14,226 variants from 24 cancer-related genes and 5,398 variants from 30 cardiovascular-related genes). Using 10-fold cross-validated predictions, the logistic regression model achieved an area under the receiver operating characteristic curve (AUROC) of 97.8% (cancer) and 98.8% (cardiovascular), while the random forest model achieved 98.3% (cancer) and 98.6% (cardiovascular). We demonstrate generalizability to different genes by validating predictions on genes withheld from training (96.8% AUROC). High accuracy and broad applicability make LEAP effective in the clinical setting as a high-throughput quality control layer.

摘要

基因组测序的进步使得新型错义变异的发现呈指数级增长,但确定其临床意义的证据可能有限或存在冲突。在这里,我们提出了 LEAP(从证据中学习评估致病性),这是一种机器学习模型,它利用多种特征类别对变体进行分类,在多个基因和不同健康状况下都能取得优异的性能。特征类别包括功能预测、剪接预测、群体频率、保守分数、蛋白质结构域数据以及个人和家族史和共变量等临床观察数据。我们在 Color Genomics 的常规临床检测过程中检测和分类了错义变异(来自 24 个癌症相关基因的 14,226 个变体和来自 30 个心血管相关基因的 5,398 个变体),并基于这些变体训练了 L2-正则化逻辑回归和随机森林分类模型。使用 10 倍交叉验证预测,逻辑回归模型在癌症(AUROC 为 97.8%)和心血管(AUROC 为 98.8%)方面的表现达到了 97.8%(癌症)和 98.8%(心血管),随机森林模型则分别达到了 98.3%(癌症)和 98.6%(心血管)。通过对未参与训练的基因进行预测验证,我们证明了该模型具有广泛的适用性(AUROC 为 96.8%)。LEAP 具有高准确性和广泛的适用性,使其成为一种有效的高通量质量控制层,可在临床环境中应用。

相似文献

1
LEAP: Using machine learning to support variant classification in a clinical setting.LEAP:利用机器学习在临床环境中支持变异分类。
Hum Mutat. 2020 Jun;41(6):1079-1090. doi: 10.1002/humu.24011. Epub 2020 Apr 1.
2
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
3
Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction.超越多药耐药性:利用机器和统计学习模型在结核分枝杆菌耐药性预测中的罕见变异。
EBioMedicine. 2019 May;43:356-369. doi: 10.1016/j.ebiom.2019.04.016. Epub 2019 Apr 29.
4
A novel machine learning-based approach for the computational functional assessment of pharmacogenomic variants.一种基于新型机器学习的方法,用于计算药物基因组变异的功能评估。
Hum Genomics. 2021 Aug 9;15(1):51. doi: 10.1186/s40246-021-00352-1.
5
A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.基于机器学习的数据驱动方法预测糖尿病和心血管疾病。
BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.
6
Statistical-learning strategies generate only modestly performing predictive models for urinary symptoms following external beam radiotherapy of the prostate: A comparison of conventional and machine-learning methods.统计学习策略生成的预测模型在前列腺外照射放疗后对泌尿系统症状的预测表现一般:传统方法与机器学习方法的比较
Med Phys. 2016 May;43(5):2040. doi: 10.1118/1.4944738.
7
Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants.基于基因的机器学习在 BRCA1 和 BRCA2 错义变异致病性预测中的应用。
Sci Rep. 2023 Jun 28;13(1):10478. doi: 10.1038/s41598-023-37698-6.
8
Predicting pathogenicity of missense variants with weakly supervised regression.利用弱监督回归预测错义变异的致病性。
Hum Mutat. 2019 Sep;40(9):1579-1592. doi: 10.1002/humu.23826. Epub 2019 Aug 7.
9
Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach.急诊科脓毒症患者院内死亡率的预测:一种基于本地大数据驱动的机器学习方法。
Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13.
10
Classifying osteosarcoma patients using machine learning approaches.使用机器学习方法对骨肉瘤患者进行分类。
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:82-85. doi: 10.1109/EMBC.2017.8036768.

引用本文的文献

1
Increasing pathogenic germline variant diagnosis rates in precision medicine: current best practices and future opportunities.提高精准医学中致病种系变异的诊断率:当前最佳实践与未来机遇
Hum Genomics. 2025 Aug 22;19(1):97. doi: 10.1186/s40246-025-00811-z.
2
DTreePred: an online viewer based on machine learning for pathogenicity prediction of genomic variants.DTreePred:一种基于机器学习的在线基因组变异致病性预测查看器。
BMC Bioinformatics. 2025 Apr 9;26(1):101. doi: 10.1186/s12859-025-06113-4.
3
Artificial Intelligence Advancements in Cardiomyopathies: Implications for Diagnosis and Management of Arrhythmogenic Cardiomyopathy.

本文引用的文献

1
Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants.评估 BRCA1 和 BRCA2 变异的临床意义的盲法预测。
Hum Mutat. 2019 Sep;40(9):1546-1556. doi: 10.1002/humu.23861. Epub 2019 Aug 23.
2
Multi-Gene Panel Testing of 23,179 Individuals for Hereditary Cancer Risk Identifies Pathogenic Variant Carriers Missed by Current Genetic Testing Guidelines.对 23179 个人进行多基因panel 检测以确定遗传性癌症风险,发现了当前遗传检测指南错过的致病性变异携带者。
J Mol Diagn. 2019 Jul;21(4):646-657. doi: 10.1016/j.jmoldx.2019.03.001. Epub 2019 Jun 11.
3
ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants.
心肌病领域的人工智能进展:对致心律失常性心肌病诊断和管理的影响。
Curr Heart Fail Rep. 2024 Dec 11;22(1):5. doi: 10.1007/s11897-024-00688-4.
4
Promise and Peril of a Genotype-First Approach to Mendelian Cardiovascular Disease.基因型优先策略在孟德尔心血管疾病诊治中的机遇与挑战。
J Am Heart Assoc. 2024 Nov 5;13(21):e033557. doi: 10.1161/JAHA.123.033557. Epub 2024 Oct 18.
5
Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors.变异影响预测器数据库(VIPdb),版本 2:三十年来遗传变异影响预测器的趋势。
Hum Genomics. 2024 Aug 28;18(1):90. doi: 10.1186/s40246-024-00663-z.
6
Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors.变异影响预测数据库(VIPdb),版本2:25年基因变异影响预测的趋势
bioRxiv. 2024 Jun 28:2024.06.25.600283. doi: 10.1101/2024.06.25.600283.
7
CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods.CAGI,即基因组解读的关键评估,旨在评估计算遗传变异解读方法的进展和前景。
Genome Biol. 2024 Feb 22;25(1):53. doi: 10.1186/s13059-023-03113-6.
8
A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer.遗传变异数据库和机器学习工具在预测乳腺癌致病性方面的研究进展。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad479.
9
Reclassifying variations of unknown significance in diseases affecting Saudi Arabia's population reveal new associations.对影响沙特阿拉伯人群疾病中意义不明变异进行重新分类揭示了新的关联。
Front Genet. 2023 Nov 1;14:1250317. doi: 10.3389/fgene.2023.1250317. eCollection 2023.
10
Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants.基于基因的机器学习在 BRCA1 和 BRCA2 错义变异致病性预测中的应用。
Sci Rep. 2023 Jun 28;13(1):10478. doi: 10.1038/s41598-023-37698-6.
ClinPred:用于识别与疾病相关的非同义单核苷酸变异的预测工具。
Am J Hum Genet. 2018 Oct 4;103(4):474-483. doi: 10.1016/j.ajhg.2018.08.005. Epub 2018 Sep 13.
4
Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework.将 ACMG/AMP 变异分类指南建模为贝叶斯分类框架。
Genet Med. 2018 Sep;20(9):1054-1060. doi: 10.1038/gim.2017.210. Epub 2018 Jan 4.
5
Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines.评估 ACMG/AMP 临床变异解读指南中使用的计算机算法。
Genome Biol. 2017 Nov 28;18(1):225. doi: 10.1186/s13059-017-1353-5.
6
Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria.Sherloc:ACMG-AMP 变异分类标准的全面细化。
Genet Med. 2017 Oct;19(10):1105-1117. doi: 10.1038/gim.2017.37. Epub 2017 May 11.
7
The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies.人类基因突变数据库:致力于打造一个全面的遗传性突变数据仓库,服务于医学研究、基因诊断及新一代测序研究。
Hum Genet. 2017 Jun;136(6):665-677. doi: 10.1007/s00439-017-1779-6. Epub 2017 Mar 27.
8
InterPro in 2017-beyond protein family and domain annotations.2017年的InterPro——超越蛋白质家族和结构域注释
Nucleic Acids Res. 2017 Jan 4;45(D1):D190-D199. doi: 10.1093/nar/gkw1107. Epub 2016 Nov 29.
9
REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.REVEL:一种预测罕见错义变异致病性的集成方法。
Am J Hum Genet. 2016 Oct 6;99(4):877-885. doi: 10.1016/j.ajhg.2016.08.016. Epub 2016 Sep 22.
10
Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。
Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.