• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于随机森林的拷贝数变异基因分型及准确性评估框架。

A random forest-based framework for genotyping and accuracy assessment of copy number variations.

作者信息

Zhuang Xuehan, Ye Rui, So Man-Ting, Lam Wai-Yee, Karim Anwarul, Yu Michelle, Ngo Ngoc Diem, Cherny Stacey S, Tam Paul Kwong-Hang, Garcia-Barcelo Maria-Mercè, Tang Clara Sze-Man, Sham Pak Chung

机构信息

Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.

Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.

出版信息

NAR Genom Bioinform. 2020 Sep 22;2(3):lqaa071. doi: 10.1093/nargab/lqaa071. eCollection 2020 Sep.

DOI:10.1093/nargab/lqaa071
PMID:33575619
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7671382/
Abstract

Detection of copy number variations (CNVs) is essential for uncovering genetic factors underlying human diseases. However, CNV detection by current methods is prone to error, and precisely identifying CNVs from paired-end whole genome sequencing (WGS) data is still challenging. Here, we present a framework, CNV-JACG, for udging the ccuracy of NVs and enotyping using paired-end WGS data. CNV-JACG is based on a random forest model trained on 21 distinctive features characterizing the CNV region and its breakpoints. Using the data from the 1000 Genomes Project, Genome in a Bottle Consortium, the Human Genome Structural Variation Consortium and in-house technical replicates, we show that CNV-JACG has superior sensitivity over the latest genotyping method, SV, particularly for the small CNVs (≤1 kb). We also demonstrate that CNV-JACG outperforms SV in terms of Mendelian inconsistency in trios and concordance between technical replicates. Our study suggests that CNV-JACG would be a useful tool in assessing the accuracy of CNVs to meet the ever-growing needs for uncovering the missing heritability linked to CNVs.

摘要

检测拷贝数变异(CNV)对于揭示人类疾病的遗传因素至关重要。然而,目前通过现有方法进行CNV检测容易出错,并且从双末端全基因组测序(WGS)数据中准确识别CNV仍然具有挑战性。在此,我们提出了一个名为CNV-JACG的框架,用于使用双末端WGS数据判断CNV的准确性并进行基因分型。CNV-JACG基于一个随机森林模型,该模型通过表征CNV区域及其断点的21个独特特征进行训练。利用来自千人基因组计划、基因组瓶子联盟、人类基因组结构变异联盟的数据以及内部技术重复数据,我们表明CNV-JACG比最新的基因分型方法SV具有更高的灵敏度,特别是对于小的CNV(≤1 kb)。我们还证明,在三人组中的孟德尔不一致性以及技术重复之间的一致性方面,CNV-JACG优于SV。我们的研究表明,CNV-JACG将成为评估CNV准确性的有用工具,以满足不断增长的揭示与CNV相关的缺失遗传力的需求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/3e8415535017/lqaa071fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/66a48ea46f22/lqaa071fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/b588c7bc40b8/lqaa071fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/db19e151e286/lqaa071fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/3e8415535017/lqaa071fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/66a48ea46f22/lqaa071fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/b588c7bc40b8/lqaa071fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/db19e151e286/lqaa071fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b1d/7671382/3e8415535017/lqaa071fig4.jpg

相似文献

1
A random forest-based framework for genotyping and accuracy assessment of copy number variations.一种基于随机森林的拷贝数变异基因分型及准确性评估框架。
NAR Genom Bioinform. 2020 Sep 22;2(3):lqaa071. doi: 10.1093/nargab/lqaa071. eCollection 2020 Sep.
2
Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis.采用低覆盖度和双端测序策略进行全基因组测序分析,效率高,优于基于阵列的 CNV 分析。
J Med Genet. 2018 Nov;55(11):735-743. doi: 10.1136/jmedgenet-2018-105272. Epub 2018 Jul 30.
3
A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data.基于读深度的全基因组序列数据拷贝数变异识别的综合工作流程。
Am J Hum Genet. 2018 Jan 4;102(1):142-155. doi: 10.1016/j.ajhg.2017.12.007.
4
CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing.CNV-RF是一种基于随机森林的利用下一代测序技术进行拷贝数变异检测的方法。
J Mol Diagn. 2016 Nov;18(6):872-881. doi: 10.1016/j.jmoldx.2016.07.001. Epub 2016 Sep 3.
5
Noise cancellation using total variation for copy number variation detection.利用全变差降噪进行拷贝数变异检测。
BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.
6
CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data.CNV-PCC:一种从下一代测序数据中检测拷贝数变异的有效方法。
Front Bioeng Biotechnol. 2022 Dec 1;10:1000638. doi: 10.3389/fbioe.2022.1000638. eCollection 2022.
7
[Copy number variations in the human genome: their mutational mechanisms and roles in diseases].[人类基因组中的拷贝数变异:其突变机制及在疾病中的作用]
Yi Chuan. 2011 Aug;33(8):857-69. doi: 10.3724/sp.j.1005.2011.00857.
8
Arabidopsis thaliana population analysis reveals high plasticity of the genomic region spanning MSH2, AT3G18530 and AT3G18535 genes and provides evidence for NAHR-driven recurrent CNV events occurring in this location.拟南芥群体分析揭示了跨越MSH2、AT3G18530和AT3G18535基因的基因组区域具有高度可塑性,并为该位置发生的由非等位基因同源重组驱动的复发性拷贝数变异事件提供了证据。
BMC Genomics. 2016 Nov 8;17(1):893. doi: 10.1186/s12864-016-3221-1.
9
Copy number variation genotyping using family information.基于家系信息的拷贝数变异基因分型。
BMC Bioinformatics. 2013 May 9;14:157. doi: 10.1186/1471-2105-14-157.
10
Combinatorial approach to estimate copy number genotype using whole-exome sequencing data.利用全外显子组测序数据估计拷贝数基因型的组合方法。
Genomics. 2015 Mar;105(3):145-9. doi: 10.1016/j.ygeno.2014.12.003. Epub 2014 Dec 20.

引用本文的文献

1
Artificial intelligence-based approaches for the detection and prioritization of genomic mutations in congenital surgical diseases.基于人工智能的先天性外科疾病基因组突变检测及优先级排序方法。
Front Pediatr. 2023 Aug 1;11:1203289. doi: 10.3389/fped.2023.1203289. eCollection 2023.
2
Combining Clinical and Genetic Data to Predict Response to Fingolimod Treatment in Relapsing Remitting Multiple Sclerosis Patients: A Precision Medicine Approach.结合临床和基因数据预测复发缓解型多发性硬化症患者对芬戈莫德治疗的反应:一种精准医学方法。
J Pers Med. 2023 Jan 6;13(1):122. doi: 10.3390/jpm13010122.
3
Population-scale genotyping of structural variation in the era of long-read sequencing.

本文引用的文献

1
A robust benchmark for detection of germline large deletions and insertions.一种用于检测种系大片段缺失和插入的稳健基准
Nat Biotechnol. 2020 Nov;38(11):1347-1355. doi: 10.1038/s41587-020-0538-8. Epub 2020 Jun 15.
2
A structural variation reference for medical and population genetics.医学和人群遗传学的结构变异参考
Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27.
3
The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families.加州大学圣克鲁兹分校重复序列浏览器可实现对重复序列家族间进化冲突的发现与可视化展示。
长读长测序时代结构变异的群体规模基因分型
Comput Struct Biotechnol J. 2022 May 27;20:2639-2647. doi: 10.1016/j.csbj.2022.05.047. eCollection 2022.
4
Comprehensive analysis of recessive carrier status using exome and genome sequencing data in 1543 Southern Chinese.利用外显子组和基因组测序数据对1543名中国南方人群的隐性携带者状态进行综合分析。
NPJ Genom Med. 2022 Mar 21;7(1):23. doi: 10.1038/s41525-022-00287-z.
5
CNV-P: a machine-learning framework for predicting high confident copy number variations.CNV-P:一种用于预测高可信度拷贝数变异的机器学习框架。
PeerJ. 2021 Dec 2;9:e12564. doi: 10.7717/peerj.12564. eCollection 2021.
6
Sequencing of a Chinese tetralogy of Fallot cohort reveals clustering mutations in myogenic heart progenitors.对中国法洛四联症队列的测序揭示了肌源性心脏祖细胞中的聚类突变。
JCI Insight. 2022 Jan 25;7(2):e152198. doi: 10.1172/jci.insight.152198.
7
Cascade Deep Forest With Heterogeneous Similarity Measures for Drug-Target Interaction Prediction.基于异构相似性度量的级联深度森林用于药物-靶点相互作用预测
Front Genet. 2021 Aug 24;12:702259. doi: 10.3389/fgene.2021.702259. eCollection 2021.
8
The Emerging Genetic Landscape of Hirschsprung Disease and Its Potential Clinical Applications.先天性巨结肠症的新兴遗传格局及其潜在临床应用
Front Pediatr. 2021 Aug 5;9:638093. doi: 10.3389/fped.2021.638093. eCollection 2021.
Mob DNA. 2020 Mar 31;11:13. doi: 10.1186/s13100-020-00208-w. eCollection 2020.
4
Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing.全基因组测序结构变异检测算法的综合评估。
Genome Biol. 2019 Jun 3;20(1):117. doi: 10.1186/s13059-019-1720-5.
5
Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2.单体型解析与整合分析肝癌细胞系 HepG2 的基因组。
Nucleic Acids Res. 2019 May 7;47(8):3846-3861. doi: 10.1093/nar/gkz169.
6
Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools.对文特尔/人类参考基因组进行广泛而深入的测序,以开发和基准测试基因组分析工具。
Sci Data. 2018 Dec 18;5:180261. doi: 10.1038/sdata.2018.261.
7
Reliability of Whole-Exome Sequencing for Assessing Intratumor Genetic Heterogeneity.全外显子组测序评估肿瘤内遗传异质性的可靠性。
Cell Rep. 2018 Nov 6;25(6):1446-1457. doi: 10.1016/j.celrep.2018.10.046.
8
Identification of Genes Associated With Hirschsprung Disease, Based on Whole-Genome Sequence Analysis, and Potential Effects on Enteric Nervous System Development.基于全基因组序列分析鉴定先天性巨结肠相关基因及其对肠神经系统发育的潜在影响。
Gastroenterology. 2018 Dec;155(6):1908-1922.e5. doi: 10.1053/j.gastro.2018.09.012. Epub 2018 Sep 12.
9
Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis.采用低覆盖度和双端测序策略进行全基因组测序分析,效率高,优于基于阵列的 CNV 分析。
J Med Genet. 2018 Nov;55(11):735-743. doi: 10.1136/jmedgenet-2018-105272. Epub 2018 Jul 30.
10
FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. FusorSV:一种用于最优组合来自多种结构变异检测方法的数据的算法。
Genome Biol. 2018 Mar 20;19(1):38. doi: 10.1186/s13059-018-1404-6.