• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CADD v1.7:利用蛋白质语言模型、调控 CNN 以及其他核苷酸水平的评分来提高全基因组变异预测的准确性。

CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.

机构信息

Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany.

Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany.

出版信息

Nucleic Acids Res. 2024 Jan 5;52(D1):D1143-D1154. doi: 10.1093/nar/gkad989.

DOI:10.1093/nar/gkad989
PMID:38183205
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10767851/
Abstract

Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.

摘要

基于机器学习的遗传变异评分和分类有助于评估临床发现,并用于在各种遗传研究和分析中优先考虑变异。综合注释依赖耗竭(CADD)是一种用于在不同分子功能中对变体进行全基因组优先排序的方法之一,自最初发表以来一直在不断发展和改进。在这里,我们呈现了我们的最新版本 CADD v1.7。我们探索并整合了新的注释特征,其中包括最先进的蛋白质语言模型评分(Meta ESM-1v)、基于序列的卷积神经网络的调控变异效应预测以及序列保守性评分(Zoonomia)。我们在来自 ClinVar、ExAC/gnomAD 和 1000 Genomes 变体的数据集上评估了新版本。对于编码效应,我们在来自 ProteinGym 的 31 个深度突变扫描(DMS)数据集上测试了 CADD,对于调控效应预测,我们使用了启动子和增强子序列的饱和诱变报告基因检测数据。新特征的加入进一步提高了 CADD 的整体性能。与以前的版本一样,所有数据集、全基因组 CADD v1.7 评分、现场评分脚本以及易于使用的网络服务器都可通过 https://cadd.bihealth.org/ 或 https://cadd.gs.washington.edu/ 免费提供给社区。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/d00163cae775/gkad989fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/b59b99550c02/gkad989figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/f958d9f72e51/gkad989fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/8ee2076f4566/gkad989fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/891ef18f01a5/gkad989fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/d00163cae775/gkad989fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/b59b99550c02/gkad989figgra1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/f958d9f72e51/gkad989fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/8ee2076f4566/gkad989fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/891ef18f01a5/gkad989fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1d82/10767851/d00163cae775/gkad989fig4.jpg

相似文献

1
CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions.CADD v1.7:利用蛋白质语言模型、调控 CNN 以及其他核苷酸水平的评分来提高全基因组变异预测的准确性。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1143-D1154. doi: 10.1093/nar/gkad989.
2
CADD: predicting the deleteriousness of variants throughout the human genome.CADD:预测整个人类基因组中变异的有害性。
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894. doi: 10.1093/nar/gky1016.
3
CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores.使用深度学习衍生的剪接分数提高 CADD-Splice 全基因组变异效应预测。
Genome Med. 2021 Feb 22;13(1):31. doi: 10.1186/s13073-021-00835-9.
4
A general framework for estimating the relative pathogenicity of human genetic variants.一种用于估计人类遗传变异相对致病性的通用框架。
Nat Genet. 2014 Mar;46(3):310-5. doi: 10.1038/ng.2892. Epub 2014 Feb 2.
5
Evaluation of CADD Scores in Curated Mismatch Repair Gene Variants Yields a Model for Clinical Validation and Prioritization.对经过整理的错配修复基因变异中的CADD评分进行评估,得出了一个用于临床验证和优先级排序的模型。
Hum Mutat. 2015 Jul;36(7):712-9. doi: 10.1002/humu.22798. Epub 2015 May 20.
6
PhD-SNPg: updating a webserver and lightweight tool for scoring nucleotide variants.PhD-SNPg:更新一个用于评分核苷酸变异的网络服务器和轻量级工具。
Nucleic Acids Res. 2023 Jul 5;51(W1):W451-W458. doi: 10.1093/nar/gkad455.
7
CADD score has limited clinical validity for the identification of pathogenic variants in noncoding regions in a hereditary cancer panel.在遗传性癌症检测板中,CADD评分在识别非编码区域的致病变异方面的临床有效性有限。
Genet Med. 2016 Dec;18(12):1269-1275. doi: 10.1038/gim.2016.44. Epub 2016 May 5.
8
PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants.PhD-SNPg:一个用于评分单核苷酸变异的网络服务器和轻量级工具。
Nucleic Acids Res. 2017 Jul 3;45(W1):W247-W252. doi: 10.1093/nar/gkx369.
9
The VAAST Variant Prioritizer (VVP): ultrafast, easy to use whole genome variant prioritization tool.VAAST 变异优先级工具(VVP):超快、易用的全基因组变异优先级工具。
BMC Bioinformatics. 2018 Feb 20;19(1):57. doi: 10.1186/s12859-018-2056-y.
10
Cross-protein transfer learning substantially improves disease variant prediction.跨蛋白迁移学习显著提高了疾病变异体预测的性能。
Genome Biol. 2023 Aug 7;24(1):182. doi: 10.1186/s13059-023-03024-6.

引用本文的文献

1
Creating an atlas of variant effects to resolve variants of uncertain significance and guide cardiovascular medicine.创建一个变异效应图谱,以解析意义未明的变异并指导心血管医学。
Nat Rev Cardiol. 2025 Sep 1. doi: 10.1038/s41569-025-01201-7.
2
A stem cell differentiation model reveals two alternative fates in CBFA2T3::GLIS2-driven acute megakaryoblastic leukemia initiation.一种干细胞分化模型揭示了在CBFA2T3::GLIS2驱动的急性巨核细胞白血病起始过程中的两种不同命运。
Commun Biol. 2025 Aug 27;8(1):1289. doi: 10.1038/s42003-025-08730-4.
3
Exploring genotype-phenotype correlations in pathological myopia: a case report.

本文引用的文献

1
A mutation rate model at the basepair resolution identifies the mutagenic effect of polymerase III transcription.碱基对分辨率下的突变率模型确定了聚合酶 III 转录的诱变效应。
Nat Genet. 2023 Dec;55(12):2235-2242. doi: 10.1038/s41588-023-01562-0. Epub 2023 Nov 30.
2
Genome-wide prediction of disease variant effects with a deep protein language model.利用深度蛋白质语言模型进行全基因组疾病变异效应预测。
Nat Genet. 2023 Sep;55(9):1512-1522. doi: 10.1038/s41588-023-01465-0. Epub 2023 Aug 10.
3
ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden.
探索病理性近视的基因型-表型相关性:一例病例报告。
Front Med (Lausanne). 2025 Aug 5;12:1624093. doi: 10.3389/fmed.2025.1624093. eCollection 2025.
4
Early-onset macular drusen, a monogenic form of age-related macular degeneration.早发性黄斑玻璃疣,年龄相关性黄斑变性的一种单基因形式。
Am J Ophthalmol Case Rep. 2025 Aug 8;39:102408. doi: 10.1016/j.ajoc.2025.102408. eCollection 2025 Sep.
5
Heterozygous KRT32 variant is responsible for autosomal dominant loose anagen hair syndrome.杂合的KRT32变异体是常染色体显性遗传性生长期松动综合征的病因。
HGG Adv. 2025 Aug 14;6(4):100495. doi: 10.1016/j.xhgg.2025.100495.
6
The Link Between Human Alkyladenine DNA Glycosylase and Cancer Development.人类烷基腺嘌呤DNA糖基化酶与癌症发展之间的联系。
Int J Mol Sci. 2025 Aug 7;26(15):7647. doi: 10.3390/ijms26157647.
7
Rare variants and pantothenate-kinase-associated neurodegeneration in the Dominican Republic.多米尼加共和国的罕见变异与泛酸激酶相关神经变性
Brain Commun. 2025 Aug 4;7(4):fcaf286. doi: 10.1093/braincomms/fcaf286. eCollection 2025.
8
A gene regulatory element modulates myosin expression and controls cardiomyocyte response to stress.一种基因调控元件可调节肌球蛋白表达并控制心肌细胞对应激的反应。
bioRxiv. 2025 Jul 20:2025.07.19.665672. doi: 10.1101/2025.07.19.665672.
9
Prediction of human pathogenic start loss variants based on self-supervised contrastive learning.基于自监督对比学习预测人类致病起始缺失变异体。
BMC Biol. 2025 Aug 8;23(1):250. doi: 10.1186/s12915-025-02348-y.
10
Exome analysis links kidney malformations to developmental disorders and reveals causal genes.外显子组分析将肾脏畸形与发育障碍联系起来,并揭示了致病基因。
Nat Commun. 2025 Aug 7;16(1):7290. doi: 10.1038/s41467-025-62319-3.
ClinVar 和 HGMD 基因组变异分类的准确性随着时间的推移有所提高,这可以通过潜在疾病负担来衡量。
Genome Med. 2023 Jul 13;15(1):51. doi: 10.1186/s13073-023-01199-y.
4
Updated benchmarking of variant effect predictors using deep mutational scanning.使用深度突变扫描对变异效应预测器进行更新的基准测试。
Mol Syst Biol. 2023 Aug 8;19(8):e11474. doi: 10.15252/msb.202211474. Epub 2023 Jun 13.
5
A global catalog of whole-genome diversity from 233 primate species.233 种灵长类动物的全基因组多样性全球目录。
Science. 2023 Jun 2;380(6648):906-913. doi: 10.1126/science.abn7829. Epub 2023 Jun 1.
6
The landscape of tolerated genetic variation in humans and primates.人类和灵长类动物中可耐受遗传变异的景观。
Science. 2023 Jun 2;380(6648):eabn8153. doi: 10.1126/science.abn8197.
7
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
8
Evolutionary constraint and innovation across hundreds of placental mammals.数百种胎盘哺乳动物的进化约束与创新。
Science. 2023 Apr 28;380(6643):eabn3943. doi: 10.1126/science.abn3943.
9
Mammalian evolution of human cis-regulatory elements and transcription factor binding sites.人类顺式调控元件和转录因子结合位点的哺乳动物进化。
Science. 2023 Apr 28;380(6643):eabn7930. doi: 10.1126/science.abn7930.
10
Predicting the pathogenicity of missense variants using features derived from AlphaFold2.利用源自 AlphaFold2 的特征预测错义变异的致病性。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad280.