• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CNVoyant:用于准确和可解释的拷贝数变异分类的机器学习框架。

CNVoyant a machine learning framework for accurate and explainable copy number variant classification.

机构信息

The Office of Data Sciences, The Abigail Wexner Research Institute at Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA.

The Steve and Cindy Rasmussen Institute for Genomic Medicine, The Abigail Wexner Research Institute, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH, 43215, USA.

出版信息

Sci Rep. 2024 Sep 28;14(1):22411. doi: 10.1038/s41598-024-72470-4.

DOI:10.1038/s41598-024-72470-4
PMID:39333267
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11437066/
Abstract

The precise classification of copy number variants (CNVs) presents a significant challenge in genomic medicine, primarily due to the complex nature of CNVs and their diverse impact on rare genetic diseases (RGDs). This complexity is compounded by the limitations of existing methods in accurately distinguishing between benign, uncertain, and pathogenic CNVs. Addressing this gap, we introduce CNVoyant, a machine learning-based multi-class framework designed to enhance the clinical significance classification of CNVs. Trained on a comprehensive dataset of 52,176 ClinVar entries across pathogenic, uncertain, and benign classifications, CNVoyant incorporates a broad spectrum of genomic features, including genome position, disease-gene annotations, dosage sensitivity, and conservation scores. Models to predict the clinical significance of copy number gains and losses were trained independently. Final models were selected after testing 29 machine learning architectures and 10,000 hyperparameter combinations each for deletions and duplications via fivefold cross-validation. We validate the performance of CNVoyant by leveraging a comprehensive set of 21,574 CNVs from the DECIPHER database, a highly regarded resource known for its extensive catalog of chromosomal imbalances linked to clinical outcomes. Compared to alternative approaches, CNVoyant shows marked improvements in precision-recall and ROC AUC metrics for binary pathogenic classifications while going one step further, offering multi-classification of clinical significance and corresponding SHAP explainability plots. Additionally, when provided germline CNV calls from real-world RGD cases with diagnostic CNV(s), CNVoyant correctly classified all diagnostic CNVs as having pathogenic significance with high confidence. This large-scale validation demonstrates CNVoyant's superior accuracy and underscores its potential to aid genomic researchers and clinical geneticists in interpreting the clinical implications of real CNVs.

摘要

拷贝数变异 (CNVs) 的精确分类在基因组医学中是一个重大挑战,主要是由于 CNVs 的复杂性及其对罕见遗传疾病 (RGD) 的多种影响。现有的方法在准确区分良性、不确定和致病性 CNVs 方面存在局限性,这使得这种复杂性更加严重。为了解决这一差距,我们引入了 CNVoyant,这是一种基于机器学习的多类框架,旨在增强 CNV 的临床意义分类。该框架在经过致病性、不确定和良性分类的 52,176 个 ClinVar 条目综合数据集上进行了训练,其中包含了广泛的基因组特征,包括基因组位置、疾病基因注释、剂量敏感性和保守分数。分别为拷贝数增益和缺失训练了预测临床意义的模型。通过五重交叉验证,对 29 种机器学习架构和 10,000 种超参数组合进行了测试,然后为缺失和重复分别选择了最终模型。我们通过利用 DECIPHER 数据库中的 21,574 个 CNV 来验证 CNVoyant 的性能,该数据库是一个备受推崇的资源,因其包含与临床结果相关的广泛染色体失衡目录而闻名。与替代方法相比,CNVoyant 在二进制致病性分类的精度-召回率和 ROC AUC 指标方面表现出显著的改进,同时更进一步,提供了临床意义的多分类和相应的 SHAP 可解释性图。此外,当提供来自具有诊断性 CNV 的真实 RGD 病例的种系 CNV 调用时,CNVoyant 以高置信度正确地将所有诊断性 CNV 分类为具有致病性意义。这种大规模验证表明了 CNVoyant 的卓越准确性,并强调了其在帮助基因组研究人员和临床遗传学家解释真实 CNV 的临床意义方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/e3c0a04f80da/41598_2024_72470_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/893257bd0e20/41598_2024_72470_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/4181956c263b/41598_2024_72470_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/74eb187896cc/41598_2024_72470_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/77cba508e686/41598_2024_72470_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/e3c0a04f80da/41598_2024_72470_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/893257bd0e20/41598_2024_72470_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/4181956c263b/41598_2024_72470_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/74eb187896cc/41598_2024_72470_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/77cba508e686/41598_2024_72470_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f970/11437066/e3c0a04f80da/41598_2024_72470_Fig5_HTML.jpg

相似文献

1
CNVoyant a machine learning framework for accurate and explainable copy number variant classification.CNVoyant:用于准确和可解释的拷贝数变异分类的机器学习框架。
Sci Rep. 2024 Sep 28;14(1):22411. doi: 10.1038/s41598-024-72470-4.
2
CNVoyant: A Highly Performant and Explainable Multi-Classifier Machine Learning Approach for Determining the Clinical Significance of Copy Number Variants.CNVoyant:一种用于确定拷贝数变异临床意义的高性能且可解释的多分类器机器学习方法。
Res Sq. 2024 Apr 30:rs.3.rs-4308324. doi: 10.21203/rs.3.rs-4308324/v1.
3
A Sparse Learning Framework for Joint Effect Analysis of Copy Number Variants.用于拷贝数变异联合效应分析的稀疏学习框架。
IEEE/ACM Trans Comput Biol Bioinform. 2017 Sep-Oct;14(5):1013-1027. doi: 10.1109/TCBB.2015.2462332.
4
Exome copy number variant detection, analysis, and classification in a large cohort of families with undiagnosed rare genetic disease.在一大群未确诊罕见遗传病的家庭中进行外显子组拷贝数变异检测、分析和分类。
Am J Hum Genet. 2024 May 2;111(5):863-876. doi: 10.1016/j.ajhg.2024.03.008. Epub 2024 Apr 1.
5
Automated prediction of the clinical impact of structural copy number variations.自动化预测结构拷贝数变异的临床影响。
Sci Rep. 2022 Jan 11;12(1):555. doi: 10.1038/s41598-021-04505-z.
6
A machine-learning approach for accurate detection of copy number variants from exome sequencing.一种基于机器学习的方法,用于从外显子测序中准确检测拷贝数变异。
Genome Res. 2019 Jul;29(7):1134-1143. doi: 10.1101/gr.245928.118. Epub 2019 Jun 6.
7
Copy number variant discrepancy resolution using the ClinGen dosage sensitivity map results in updated clinical interpretations in ClinVar.使用 ClinGen 剂量敏感性图谱的拷贝数变异差异解决方法可在 ClinVar 中更新临床解释。
Hum Mutat. 2018 Nov;39(11):1650-1659. doi: 10.1002/humu.23610.
8
Performance of case-control rare copy number variation annotation in classification of autism.病例对照罕见拷贝数变异注释在自闭症分类中的性能
BMC Med Genomics. 2015;8 Suppl 1(Suppl 1):S7. doi: 10.1186/1755-8794-8-S1-S7. Epub 2015 Jan 15.
9
Exome copy number variant detection, analysis and classification in a large cohort of families with undiagnosed rare genetic disease.在一大群未确诊罕见遗传病的家庭中进行外显子组拷贝数变异检测、分析和分类。
medRxiv. 2023 Oct 5:2023.10.05.23296595. doi: 10.1101/2023.10.05.23296595.
10
Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning.利用迁移学习准确地从外显子组测序数据中确认罕见拷贝数变异的调用。
Nucleic Acids Res. 2022 Nov 28;50(21):e123. doi: 10.1093/nar/gkac788.

引用本文的文献

1
The Diagnostic Value of Copy Number Variants in Genetic Cardiomyopathies and Channelopathies.拷贝数变异在遗传性心肌病和离子通道病中的诊断价值
J Cardiovasc Dev Dis. 2025 Jul 4;12(7):258. doi: 10.3390/jcdd12070258.

本文引用的文献

1
GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data.GATK-gCNV 可从外显子测序数据中发现罕见的拷贝数变异。
Nat Genet. 2023 Sep;55(9):1589-1597. doi: 10.1038/s41588-023-01449-0. Epub 2023 Aug 21.
2
Clinical genome sequencing: Three years' experience at a tertiary children's hospital.临床基因组测序:一家三级儿童医院的三年经验
Genet Med. 2023 Oct;25(10):100916. doi: 10.1016/j.gim.2023.100916. Epub 2023 Jun 16.
3
Informing a value care model: lessons from an integrated adult neurogenomics clinic.
告知价值关怀模式:来自综合成人神经基因组学临床的经验教训。
Intern Med J. 2023 Dec;53(12):2198-2207. doi: 10.1111/imj.16103. Epub 2023 May 17.
4
dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations.dbCNV:基于致病变异的模型,用于预测拷贝数变异的致病性。
BMC Genomics. 2023 Mar 20;24(1):131. doi: 10.1186/s12864-023-09225-4.
5
A cross-disorder dosage sensitivity map of the human genome.人类基因组的跨疾病剂量敏感性图谱。
Cell. 2022 Aug 4;185(16):3041-3055.e25. doi: 10.1016/j.cell.2022.06.036. Epub 2022 Aug 1.
6
Towards accurate and reliable resolution of structural variants for clinical diagnosis.致力于实现结构变异的准确可靠解析,以用于临床诊断。
Genome Biol. 2022 Mar 3;23(1):68. doi: 10.1186/s13059-022-02636-8.
7
TADA-a machine learning tool for functional annotation-based prioritisation of pathogenic CNVs.TADA——一种基于功能注释的致病性 CNV 优先级排序的机器学习工具。
Genome Biol. 2022 Mar 1;23(1):67. doi: 10.1186/s13059-022-02631-z.
8
StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants.StrVCTVRE:一种用于预测人类基因组结构变异致病性的监督学习方法。
Am J Hum Genet. 2022 Feb 3;109(2):195-209. doi: 10.1016/j.ajhg.2021.12.007. Epub 2022 Jan 14.
9
Automated prediction of the clinical impact of structural copy number variations.自动化预测结构拷贝数变异的临床影响。
Sci Rep. 2022 Jan 11;12(1):555. doi: 10.1038/s41598-021-04505-z.
10
Variant interpretation using population databases: Lessons from gnomAD.使用人群数据库进行变异解释:来自 gnomAD 的经验。
Hum Mutat. 2022 Aug;43(8):1012-1030. doi: 10.1002/humu.24309. Epub 2021 Dec 16.