使用等位基因特异性混合模型估计全基因组拷贝数。

Estimating genome-wide copy number using allele-specific mixture models.

作者信息

Wang Wenyi, Carvalho Benilton, Miller Nathaniel D, Pevsner Jonathan, Chakravarti Aravinda, Irizarry Rafael A

机构信息

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.

出版信息

J Comput Biol. 2008 Sep;15(7):857-66. doi: 10.1089/cmb.2007.0148.

DOI:10.1089/cmb.2007.0148

PMID:18707534

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2612042/

Abstract

Genomic changes such as copy number alterations are one of the major underlying causes of human phenotypic variation among normal and disease subjects. Array comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high-throughput fashion. However, this technology provides only a >30-kb resolution, which limits the ability to detect copy number alterations spanning small regions. Higher resolution technologies such as single nucleotide polymorphism (SNP) microarrays allow detection of copy number alterations at least as small as several thousand base pairs. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively, thus greatly reducing resolution. Recently, regression-type models that account for probe effects have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution, specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314-sample database, to motivate and fit models for the conditional distribution of the observed intensities given allele-specific copy number. We can then compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (www.bioconductor.org).

摘要

诸如拷贝数改变之类的基因组变化是正常人和疾病患者之间人类表型变异的主要潜在原因之一。为了以高通量方式检测拷贝数变化，人们开发了阵列比较基因组杂交（CGH）技术。然而，该技术仅提供大于30 kb的分辨率，这限制了检测跨越小区域的拷贝数改变的能力。诸如单核苷酸多态性（SNP）微阵列等更高分辨率的技术能够检测至少小至几千个碱基对的拷贝数改变。不幸的是，强烈的探针效应以及样本制备过程引入的变异使得单点拷贝数估计过于不精确而无法使用。各个研究团队已经提出了一些统计方法，这些方法通过汇总相邻位置的数据来成功提高精度。然而，这些方法需要在相对较大的区域进行平均才能有效工作，从而大大降低了分辨率。最近，已经提出了考虑探针效应的回归类型模型，这些模型似乎提高了准确性和精度。在本文中，我们提出了一种专门为单点估计设计的混合模型解决方案，该方案相对于现有方法具有多种优势。我们使用一个包含314个样本的数据库，来激发并拟合给定等位基因特异性拷贝数的观察强度的条件分布的模型。然后，我们可以计算后验概率，这些概率提供了一个有用的预测规则以及对每个调用的置信度度量。实现此过程的软件将在Bioconductor的oligo包（www.bioconductor.org）中提供。

相似文献

Estimating genome-wide copy number using allele-specific mixture models.

J Comput Biol. 2008 Sep;15(7):857-66. doi: 10.1089/cmb.2007.0148.

Combined array-comparative genomic hybridization and single-nucleotide polymorphism-loss of heterozygosity analysis reveals complex genetic alterations in cervical cancer.

BMC Genomics. 2007 Feb 20;8:53. doi: 10.1186/1471-2164-8-53.

A multilevel model to address batch effects in copy number estimation using SNP arrays.

Biostatistics. 2011 Jan;12(1):33-50. doi: 10.1093/biostatistics/kxq043. Epub 2010 Jul 12.

Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform.

BMC Bioinformatics. 2011 May 31;12:220. doi: 10.1186/1471-2105-12-220.

A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays.

Cancer Res. 2005 Jul 15;65(14):6071-9. doi: 10.1158/0008-5472.CAN-05-0465.

Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data.

Biostatistics. 2007 Apr;8(2):485-99. doi: 10.1093/biostatistics/kxl042. Epub 2006 Dec 22.

ACNE: a summarization method to estimate allele-specific copy numbers for Affymetrix SNP arrays.

Bioinformatics. 2010 Aug 1;26(15):1827-33. doi: 10.1093/bioinformatics/btq300. Epub 2010 Jun 6.

Toward accurate high-throughput SNP genotyping in the presence of inherited copy number variation.

BMC Genomics. 2007 Jul 3;8:211. doi: 10.1186/1471-2164-8-211.

Copynumber: Efficient algorithms for single- and multi-track copy number segmentation.

BMC Genomics. 2012 Nov 4;13:591. doi: 10.1186/1471-2164-13-591.

Recent advances in array comparative genomic hybridization technologies and their applications in human genetics.

Eur J Hum Genet. 2006 Feb;14(2):139-48. doi: 10.1038/sj.ejhg.5201531.

引用本文的文献

A survey of cancer genome signatures identifies genes connected to distinct chromosomal instability phenotypes.

Pharmacogenomics J. 2021 Jun;21(3):390-401. doi: 10.1038/s41397-021-00217-9. Epub 2021 Mar 17.

Combined Analysis of SNP Array Data Identifies Novel CNV Candidates and Pathways in Ependymoma and Mesothelioma.

Biomed Res Int. 2015;2015:902419. doi: 10.1155/2015/902419. Epub 2015 Jun 22.

Multiplexed direct genomic selection (MDiGS): a pooled BAC capture approach for highly accurate CNV and SNP/INDEL detection.

Nucleic Acids Res. 2014 Jun;42(10):e82. doi: 10.1093/nar/gku218. Epub 2014 Mar 20.

Multiplex target capture with double-stranded DNA probes.

Genome Med. 2013 May 29;5(5):50. doi: 10.1186/gm454. eCollection 2013.

A note on statistical method for genotype calling of high-throughput SNP arrays.

J Appl Stat. 2013;40(6):1372-1381. doi: 10.1080/02664763.2013.785499.

Mutation discovery in regions of segmental cancer genome amplifications with CoNAn-SNV: a mixture model for next generation sequencing of tumors.

PLoS One. 2012;7(8):e41551. doi: 10.1371/journal.pone.0041551. Epub 2012 Aug 16.

Using the R Package crlmm for Genotyping and Copy Number Estimation.

J Stat Softw. 2011 May 1;40(12):1-32.

Identification of rare DNA variants in mitochondrial disorders with improved array-based sequencing.

Nucleic Acids Res. 2011 Jan;39(1):44-58. doi: 10.1093/nar/gkq750. Epub 2010 Sep 15.

A multilevel model to address batch effects in copy number estimation using SNP arrays.

Biostatistics. 2011 Jan;12(1):33-50. doi: 10.1093/biostatistics/kxq043. Epub 2010 Jul 12.

TumorBoost: normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays.

BMC Bioinformatics. 2010 May 12;11:245. doi: 10.1186/1471-2105-11-245.

本文引用的文献

Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data.

Biostatistics. 2007 Apr;8(2):485-99. doi: 10.1093/biostatistics/kxl042. Epub 2006 Dec 22.

Ultra-high resolution array painting facilitates breakpoint sequencing.

J Med Genet. 2007 Jan;44(1):51-8. doi: 10.1136/jmg.2006.044909. Epub 2006 Sep 13.

Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome.

Nat Genet. 2006 Sep;38(9):1038-42. doi: 10.1038/ng1862. Epub 2006 Aug 13.

High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping.

Genome Res. 2006 Sep;16(9):1136-48. doi: 10.1101/gr.5402306. Epub 2006 Aug 9.

Noise reduction from genotyping microarrays using probe level information.

In Silico Biol. 2006;6(1-2):79-92.

PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data.

Biostatistics. 2007 Apr;8(2):323-36. doi: 10.1093/biostatistics/kxl012. Epub 2006 Jun 20.

CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays.

BMC Bioinformatics. 2006 Feb 21;7:83. doi: 10.1186/1471-2105-7-83.

A genotype calling algorithm for affymetrix SNP arrays.

Bioinformatics. 2006 Jan 1;22(1):7-12. doi: 10.1093/bioinformatics/bti741. Epub 2005 Nov 2.

A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays.

Cancer Res. 2005 Jul 15;65(14):6071-9. doi: 10.1158/0008-5472.CAN-05-0465.

Allelic dosage analysis with genotyping microarrays.

Biochem Biophys Res Commun. 2005 Aug 12;333(4):1309-14. doi: 10.1016/j.bbrc.2005.06.040.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用等位基因特异性混合模型估计全基因组拷贝数。

Estimating genome-wide copy number using allele-specific mixture models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献