跨多个个体鉴定拷贝数变异的重现区域。

Identification of recurrent regions of Copy-Number Variants across multiple individuals.

机构信息

Department of Epidemiology and Public Health, National University of Singapore, 16 Medical Drive, Singapore.

出版信息

BMC Bioinformatics. 2010 Mar 22;11:147. doi: 10.1186/1471-2105-11-147.

DOI:10.1186/1471-2105-11-147

PMID:20307285

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2851607/

Abstract

BACKGROUND

Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed.

RESULTS

In this paper, we describe two new approaches for identifying common CNV regions based on (i) the frequency of occurrence of reliable CNVs, where reliability is determined by high confidence scores, and (ii) a weighted frequency of occurrence of CNVs, where the weights are determined by the confidence scores. In addition, motivated by the fact that we often observe partially overlapping CNV regions as a mixture of two or more distinct subregions, regions identified using the two approaches can be fine-tuned to smaller sub-regions using a clustering algorithm. We compared the performance of the methods with sequencing-based results in terms of discordance rates, rates of departure from Hardy-Weinberg equilibrium (HWE) and average frequency and size of the identified regions. The discordance rates as well as the rates of departure from HWE decrease when we select CNVs with higher confidence scores. We also performed comparisons with two previously published methods, STAC and GISTIC, and showed that the methods we consider are better at identifying low-frequency but high-confidence CNV regions.

CONCLUSIONS

The proposed methods for identifying common CNV regions in multiple individuals perform well compared to existing methods. The identified common regions can be used for downstream analyses such as group comparisons in association studies.

摘要

背景

已经开发出用于 CNV 检测的算法和软件，但它们逐个样本地检测 CNV 区域，具有个体特异性断点，而常见的 CNV 区域可能在同质人群的不同个体中出现在相同的基因组位置。当前用于检测常见 CNV 区域的算法没有考虑到个体 CNV 的变化可靠性，通常 SNP 基 CNV 检测算法报告为置信分数。仍需要确定这些重复区域的一般方法，特别是针对 SNP 阵列的方法。

结果

在本文中，我们描述了两种基于（i）可靠 CNV 发生频率的识别常见 CNV 区域的新方法，其中可靠性由高置信分数确定，以及（ii）CNV 发生频率的加权频率确定，其中权重由置信分数确定。此外，鉴于我们经常观察到部分重叠的 CNV 区域作为两个或更多不同子区域的混合物，因此可以使用聚类算法将使用两种方法识别的区域调整为较小的子区域。我们比较了这些方法与基于测序的结果在不一致率、偏离 Hardy-Weinberg 平衡（HWE）的比率以及识别区域的平均频率和大小方面的性能。当我们选择具有更高置信分数的 CNV 时，不一致率以及偏离 HWE 的比率会降低。我们还与两种先前发表的方法 STAC 和 GISTIC 进行了比较，并表明我们考虑的方法在识别低频但高置信度的 CNV 区域方面表现更好。

结论

与现有方法相比，用于在多个个体中识别常见 CNV 区域的提出的方法表现良好。鉴定的常见区域可用于下游分析，例如关联研究中的组比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bfb/2851607/e9b3ba958166/1471-2105-11-147-1.jpg

相似文献

Identification of recurrent regions of Copy-Number Variants across multiple individuals.跨多个个体鉴定拷贝数变异的重现区域。

BMC Bioinformatics. 2010 Mar 22;11:147. doi: 10.1186/1471-2105-11-147.

Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans.用于人类全基因组拷贝数变异（CNV）分析的高分辨率阵列平台的综合性能比较

BMC Genomics. 2017 Apr 24;18(1):321. doi: 10.1186/s12864-017-3658-x.

The effect of algorithms on copy number variant detection.算法对拷贝数变异检测的影响。

PLoS One. 2010 Dec 30;5(12):e14456. doi: 10.1371/journal.pone.0014456.

Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort.利用大型临床队列中的 SNP 基因分型阵列鉴定和验证拷贝数变异。

BMC Genomics. 2012 Jun 15;13:241. doi: 10.1186/1471-2164-13-241.

The fine-scale and complex architecture of human copy-number variation.人类拷贝数变异的精细尺度与复杂结构。

Am J Hum Genet. 2008 Mar;82(3):685-95. doi: 10.1016/j.ajhg.2007.12.010. Epub 2008 Jan 24.

Noise cancellation using total variation for copy number variation detection.利用全变差降噪进行拷贝数变异检测。

BMC Bioinformatics. 2018 Oct 22;19(Suppl 11):361. doi: 10.1186/s12859-018-2332-x.

Sensitive and accurate detection of copy number variants using read depth of coverage.利用覆盖度的读取深度对拷贝数变异进行灵敏且准确的检测。

Genome Res. 2009 Sep;19(9):1586-92. doi: 10.1101/gr.092981.109. Epub 2009 Aug 5.

High resolution discovery and confirmation of copy number variants in 90 Yoruba Nigerians.在 90 名约鲁巴尼日利亚人中高分辨率发现和确认拷贝数变异。

Genome Biol. 2009;10(11):R125. doi: 10.1186/gb-2009-10-11-r125. Epub 2009 Nov 9.

Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays.使用高密度DNA寡核苷酸阵列进行全基因组人类拷贝数变异检测。

Genome Res. 2006 Dec;16(12):1575-84. doi: 10.1101/gr.5629106. Epub 2006 Nov 22.

Improved detection of global copy number variation using high density, non-polymorphic oligonucleotide probes.使用高密度、非多态性寡核苷酸探针改进全基因组拷贝数变异检测。

BMC Genet. 2008 Mar 28;9:27. doi: 10.1186/1471-2156-9-27.

引用本文的文献

Increased copy-number variant load of associated risk genes in sporadic cases of amyotrophic lateral sclerosis.散发性肌萎缩侧索硬化症病例中相关风险基因的拷贝数变异负荷增加。

Cell Mol Life Sci. 2024 Jul 27;81(1):316. doi: 10.1007/s00018-024-05335-8.

Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis.使用拷贝数谱曲线（CONCUR）的关联分析增强了稀有拷贝数变异分析的效能。

PLoS Comput Biol. 2020 May 4;16(5):e1007797. doi: 10.1371/journal.pcbi.1007797. eCollection 2020 May.

A large interactive visual database of copy number variants discovered in taurine cattle.牛科动物拷贝数变异的大型交互式可视化数据库。

Gigascience. 2019 Jun 1;8(6). doi: 10.1093/gigascience/giz073.

Accumulation of potential driver genes with genomic alterations predicts survival of high-risk neuroblastoma patients.潜在驱动基因的积累与基因组改变预测高危神经母细胞瘤患者的生存。

Biol Direct. 2018 Jul 16;13(1):14. doi: 10.1186/s13062-018-0218-5.

Association between copy-number variation on metabolic phenotypes and HDL-C levels in patients with polycystic ovary syndrome.多囊卵巢综合征患者代谢表型的拷贝数变异与高密度脂蛋白胆固醇水平之间的关联

Mol Biol Rep. 2017 Feb;44(1):51-61. doi: 10.1007/s11033-016-4080-1. Epub 2016 Nov 22.

Genome instability model of metastatic neuroblastoma tumorigenesis by a dictionary learning algorithm.基于字典学习算法的转移性神经母细胞瘤肿瘤发生的基因组不稳定性模型

BMC Med Genomics. 2015 Sep 10;8:57. doi: 10.1186/s12920-015-0132-y.

Integrated molecular portrait of non-small cell lung cancers.非小细胞肺癌的综合分子特征。

BMC Med Genomics. 2013 Dec 3;6:53. doi: 10.1186/1755-8794-6-53.

Copy number variation signature to predict human ancestry.拷贝数变异特征预测人类起源。

BMC Bioinformatics. 2012 Dec 27;13:336. doi: 10.1186/1471-2105-13-336.

Accuracy of CNV Detection from GWAS Data.从 GWAS 数据中检测 CNV 的准确性。

PLoS One. 2011 Jan 13;6(1):e14511. doi: 10.1371/journal.pone.0014511.

本文引用的文献

Origins and functional impact of copy number variation in the human genome.人类基因组中拷贝数变异的起源和功能影响。

Nature. 2010 Apr 1;464(7289):704-12. doi: 10.1038/nature08516. Epub 2009 Oct 7.

Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA.使用GADA对多个DNA阵列上的拷贝数变异和参考强度进行联合估计。

Bioinformatics. 2009 May 15;25(10):1223-30. doi: 10.1093/bioinformatics/btp119. Epub 2009 Mar 10.

Integrated detection and population-genetic analysis of SNPs and copy number variation.单核苷酸多态性（SNPs）与拷贝数变异的综合检测及群体遗传分析

Nat Genet. 2008 Oct;40(10):1166-74. doi: 10.1038/ng.238. Epub 2008 Sep 7.

A fast Bayesian change point analysis for the segmentation of microarray data.一种用于微阵列数据分割的快速贝叶斯变化点分析方法。

Bioinformatics. 2008 Oct 1;24(19):2143-8. doi: 10.1093/bioinformatics/btn404. Epub 2008 Jul 29.

Mapping and sequencing of structural variation from eight human genomes.来自八个人类基因组的结构变异的图谱绘制与测序

Nature. 2008 May 1;453(7191):56-64. doi: 10.1038/nature06862.

Sparse representation and Bayesian detection of genome copy number alterations from microarray data.基于微阵列数据的基因组拷贝数变异的稀疏表示与贝叶斯检测

Bioinformatics. 2008 Feb 1;24(3):309-18. doi: 10.1093/bioinformatics/btm601. Epub 2008 Jan 18.

Weighted clustering of called array CGH data.对已调用的阵列比较基因组杂交（array CGH）数据进行加权聚类。

Biostatistics. 2008 Jul;9(3):484-500. doi: 10.1093/biostatistics/kxm048. Epub 2007 Dec 22.

Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma.评估癌症中染色体畸变的意义：方法及在胶质瘤中的应用

Proc Natl Acad Sci U S A. 2007 Dec 11;104(50):20007-12. doi: 10.1073/pnas.0710052104. Epub 2007 Dec 6.

PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.PennCNV：一种为在全基因组单核苷酸多态性基因分型数据中进行高分辨率拷贝数变异检测而设计的集成隐马尔可夫模型。

Genome Res. 2007 Nov;17(11):1665-74. doi: 10.1101/gr.6861907. Epub 2007 Oct 5.

Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays.使用高分辨率基因组微阵列评估保守基因组畸变的意义。

PLoS Genet. 2007 Aug;3(8):e143. doi: 10.1371/journal.pgen.0030143.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

跨多个个体鉴定拷贝数变异的重现区域。

Identification of recurrent regions of Copy-Number Variants across multiple individuals.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献