• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EnsembleCNV:一种集成机器学习算法,用于使用 SNP 阵列数据识别和基因分型拷贝数变异。

EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data.

机构信息

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

出版信息

Nucleic Acids Res. 2019 Apr 23;47(7):e39. doi: 10.1093/nar/gkz068.

DOI:10.1093/nar/gkz068
PMID:30722045
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6468244/
Abstract

The associations between diseases/traits and copy number variants (CNVs) have not been systematically investigated in genome-wide association studies (GWASs), primarily due to a lack of robust and accurate tools for CNV genotyping. Herein, we propose a novel ensemble learning framework, ensembleCNV, to detect and genotype CNVs using single nucleotide polymorphism (SNP) array data. EnsembleCNV (a) identifies and eliminates batch effects at raw data level; (b) assembles individual CNV calls into CNV regions (CNVRs) from multiple existing callers with complementary strengths by a heuristic algorithm; (c) re-genotypes each CNVR with local likelihood model adjusted by global information across multiple CNVRs; (d) refines CNVR boundaries by local correlation structure in copy number intensities; (e) provides direct CNV genotyping accompanied with confidence score, directly accessible for downstream quality control and association analysis. Benchmarked on two large datasets, ensembleCNV outperformed competing methods and achieved a high call rate (93.3%) and reproducibility (98.6%), while concurrently achieving high sensitivity by capturing 85% of common CNVs documented in the 1000 Genomes Project. Given this CNV call rate and accuracy, which are comparable to SNP genotyping, we suggest ensembleCNV holds significant promise for performing genome-wide CNV association studies and investigating how CNVs predispose to human diseases.

摘要

疾病/特征与拷贝数变异(CNV)之间的关联尚未在全基因组关联研究(GWAS)中进行系统研究,主要是因为缺乏用于 CNV 基因分型的强大而准确的工具。在此,我们提出了一种新的集成学习框架,ensembleCNV,用于使用单核苷酸多态性(SNP)阵列数据检测和基因分型 CNV。ensembleCNV(a)在原始数据级别识别和消除批次效应;(b)通过启发式算法将来自多个现有调用者的个体 CNV 调用组装成 CNV 区域(CNVR),这些调用者具有互补的优势;(c)使用跨多个 CNVR 调整的全局信息重新对每个 CNVR 进行局部似然模型基因分型;(d)通过拷贝数强度中的局部相关结构细化 CNVR 边界;(e)提供直接的 CNV 基因分型,同时提供置信度评分,可直接用于下游质量控制和关联分析。在两个大型数据集上进行基准测试,ensembleCNV 优于竞争方法,实现了高调用率(93.3%)和可重复性(98.6%),同时通过捕获 1000 基因组计划中记录的 85%常见 CNV 实现了高灵敏度。鉴于这种 CNV 调用率和准确性与 SNP 基因分型相当,我们建议 ensembleCNV 在进行全基因组 CNV 关联研究和研究 CNV 如何导致人类疾病方面具有很大的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/48e9d0b972f9/gkz068fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/c4de67151363/gkz068fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/3454e4e830d4/gkz068fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/8aea083f7ea7/gkz068fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/0677b279e025/gkz068fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/1eb7ece0cb82/gkz068fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/48e9d0b972f9/gkz068fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/c4de67151363/gkz068fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/3454e4e830d4/gkz068fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/8aea083f7ea7/gkz068fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/0677b279e025/gkz068fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/1eb7ece0cb82/gkz068fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f30/6468244/48e9d0b972f9/gkz068fig6.jpg

相似文献

1
EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data.EnsembleCNV:一种集成机器学习算法,用于使用 SNP 阵列数据识别和基因分型拷贝数变异。
Nucleic Acids Res. 2019 Apr 23;47(7):e39. doi: 10.1093/nar/gkz068.
2
Genome-wide association study of copy number variation with lung function identifies a novel signal of association near BANP for forced vital capacity.拷贝数变异与肺功能的全基因组关联研究确定了一个靠近BANP的与用力肺活量相关的新关联信号。
BMC Genet. 2016 Aug 11;17(1):116. doi: 10.1186/s12863-016-0423-0.
3
GENOME-WIDE MAPPING OF COPY NUMBER VARIATIONS IN COMMERCIAL HYBRID PIGS USING A HIGH-DENSITY SNP GENOTYPING ARRAY.利用高密度SNP基因分型芯片对商品杂交猪拷贝数变异进行全基因组定位
Genetika. 2016 Jan;52(1):97-105. doi: 10.7868/s0016675815120140.
4
Genome-wide copy number variations inferred from SNP genotyping arrays using a Large White and Minzhu intercross population.利用大白猪和民猪杂交群体,从单核苷酸多态性基因分型阵列推断全基因组拷贝数变异。
PLoS One. 2013 Oct 1;8(10):e74879. doi: 10.1371/journal.pone.0074879. eCollection 2013.
5
Copy number variants in the sheep genome detected using multiple approaches.使用多种方法检测绵羊基因组中的拷贝数变异。
BMC Genomics. 2016 Jun 8;17:441. doi: 10.1186/s12864-016-2754-7.
6
Identification of genome-wide copy number variations among diverse pig breeds using SNP genotyping arrays.利用 SNP 基因分型芯片鉴定不同猪品种的全基因组拷贝数变异。
PLoS One. 2013 Jul 23;8(7):e68683. doi: 10.1371/journal.pone.0068683. Print 2013.
7
Identification of copy number variation hotspots in human populations.鉴定人类群体中的拷贝数变异热点。
Am J Hum Genet. 2010 Oct 8;87(4):494-504. doi: 10.1016/j.ajhg.2010.09.006.
8
A genome-wide detection of copy number variations using SNP genotyping arrays in swine.利用 SNP 基因分型芯片在猪中进行全基因组拷贝数变异的检测。
BMC Genomics. 2012 Jun 22;13:273. doi: 10.1186/1471-2164-13-273.
9
Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort.利用大型临床队列中的 SNP 基因分型阵列鉴定和验证拷贝数变异。
BMC Genomics. 2012 Jun 15;13:241. doi: 10.1186/1471-2164-13-241.
10
Genome-wide elucidation of CNV regions and their association with production and reproduction traits in composite Vrindavani cattle.复合 Vrindavani 牛全基因组 CNV 区域的阐明及其与生产和繁殖性状的关联。
Gene. 2022 Jul 1;830:146510. doi: 10.1016/j.gene.2022.146510. Epub 2022 Apr 18.

引用本文的文献

1
MarkerMatch: A Proximity-Based Probe-Matching Algorithm for Joint Analysis of Copy-Number Variants from Different Genotyping Arrays.MarkerMatch:一种基于邻近性的探针匹配算法,用于联合分析来自不同基因分型阵列的拷贝数变异
bioRxiv. 2025 Jul 4:2025.06.30.662249. doi: 10.1101/2025.06.30.662249.
2
Genome-wide association meta-analysis and rare copy number variant analysis of treatment-resistant depression.难治性抑郁症的全基因组关联荟萃分析和罕见拷贝数变异分析
Mol Psychiatry. 2025 Jun 26. doi: 10.1038/s41380-025-03084-z.
3
The Eating Disorders Genetics Initiative 2 (EDGI2): study protocol.

本文引用的文献

1
Genetic Pleiotropy between Nicotine Dependence and Respiratory Outcomes.尼古丁依赖与呼吸结局的遗传多效性。
Sci Rep. 2017 Dec 4;7(1):16907. doi: 10.1038/s41598-017-16964-4.
2
Meta-analysis of five genome-wide association studies identifies multiple new loci associated with testicular germ cell tumor.五项全基因组关联研究的荟萃分析确定了多个与睾丸生殖细胞肿瘤相关的新基因座。
Nat Genet. 2017 Jul;49(7):1141-1147. doi: 10.1038/ng.3879. Epub 2017 Jun 12.
3
The impact of structural variation on human gene expression.结构变异对人类基因表达的影响。
饮食失调遗传学倡议2(EDGI2):研究方案。
BMC Psychiatry. 2025 May 26;25(1):532. doi: 10.1186/s12888-025-06777-5.
4
EMcnv: enhancing CNV detection performance through ensemble strategies with heterogeneous meta-graph neural networks.EMcnv:通过使用异构元图神经网络的集成策略提高拷贝数变异(CNV)检测性能。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf135.
5
Transcriptomic and genetic analysis suggests a role for mitochondrial dysregulation in schizophrenia.转录组学和遗传学分析表明线粒体功能失调在精神分裂症中起作用。
medRxiv. 2025 Mar 15:2025.03.14.25323827. doi: 10.1101/2025.03.14.25323827.
6
Genome-wide copy number variation association study in anorexia nervosa.神经性厌食症的全基因组拷贝数变异关联研究。
Mol Psychiatry. 2025 May;30(5):2009-2016. doi: 10.1038/s41380-024-02811-2. Epub 2024 Nov 12.
7
Polygenic Risk Scores and Twin Concordance for Schizophrenia and Bipolar Disorder.精神分裂症和双相情感障碍的多基因风险评分与双胞胎一致性
JAMA Psychiatry. 2024 Dec 1;81(12):1246-1252. doi: 10.1001/jamapsychiatry.2024.2406.
8
Copy Number Variations in Neuropsychiatric Disorders.神经精神疾病中的拷贝数变异。
Int J Mol Sci. 2023 Sep 5;24(18):13671. doi: 10.3390/ijms241813671.
9
Increased Prevalence of Rare Copy Number Variants in Treatment-Resistant Psychosis.治疗抵抗性精神病中罕见拷贝数变异的发生率增加。
Schizophr Bull. 2023 Jul 4;49(4):881-892. doi: 10.1093/schbul/sbac175.
10
BMI-CNV: a Bayesian framework for multiple genotyping platforms detection of copy number variants.BMI-CNV:一种用于多种基因分型平台检测拷贝数变异的贝叶斯框架。
Genetics. 2022 Nov 30;222(4). doi: 10.1093/genetics/iyac147.
Nat Genet. 2017 May;49(5):692-699. doi: 10.1038/ng.3834. Epub 2017 Apr 3.
4
Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases.心血管代谢风险位点在不同组织和疾病中共享下游顺式和反式基因调控。
Science. 2016 Aug 19;353(6301):827-30. doi: 10.1126/science.aad6970.
5
Genetic pleiotropy in complex traits and diseases: implications for genomic medicine.复杂性状和疾病中的遗传多效性:对基因组医学的影响。
Genome Med. 2016 Jul 19;8(1):78. doi: 10.1186/s13073-016-0332-x.
6
Structural variation detection using next-generation sequencing data: A comparative technical review.利用下一代测序数据进行结构变异检测:一项比较技术综述。
Methods. 2016 Jun 1;102:36-49. doi: 10.1016/j.ymeth.2016.01.020. Epub 2016 Feb 1.
7
An integrated map of structural variation in 2,504 human genomes.2504个人类基因组结构变异的整合图谱。
Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.
8
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
9
Detection of Genomic Structural Variants from Next-Generation Sequencing Data.从下一代测序数据中检测基因组结构变异。
Front Bioeng Biotechnol. 2015 Jun 25;3:92. doi: 10.3389/fbioe.2015.00092. eCollection 2015.
10
Genome-wide association study identifies peanut allergy-specific loci and evidence of epigenetic mediation in US children.全基因组关联研究确定了美国儿童花生过敏特异性位点及表观遗传介导的证据。
Nat Commun. 2015 Feb 24;6:6304. doi: 10.1038/ncomms7304.