Macé Aurélien, Kutalik Zoltán, Valsesia Armand
Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland.
Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
Methods Mol Biol. 2018;1793:231-258. doi: 10.1007/978-1-4939-7868-7_14.
Differences between genomes can be due to single nucleotide variants (SNPs), translocations, inversions and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 250 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease or phenotypic traits.While the link between SNPs and disease susceptibility has been well studied, to date there are still very few published CNV genome-wide association studies; probably owing to the fact that CNV analysis remains a slightly more complex task than SNP analysis (both in term of bioinformatics workflow and uncertainty in the CNV calling leading to high false positive rates and unknown false negative rates). This chapter aims at explaining computational methods for the analysis of CNVs, ranging from study design, data processing and quality control, up to genome-wide association study with clinical traits.
基因组之间的差异可能源于单核苷酸变异(SNP)、易位、倒位和拷贝数变异(CNV,即DNA的增加或减少)。后者的范围可以从亚微观事件到完整的染色体非整倍体。小的CNV通常是良性的,但那些大于250 kb的CNV与诸如发育障碍和癌症等病态后果密切相关。检测群体内部和群体之间的CNV对于更好地理解我们基因组的可塑性以及阐明其对疾病或表型特征的可能贡献至关重要。虽然SNP与疾病易感性之间的联系已得到充分研究,但迄今为止,公开的全基因组CNV关联研究仍然非常少;这可能是由于CNV分析仍然比SNP分析稍微复杂一些(无论是在生物信息学工作流程方面,还是在CNV检测的不确定性方面,都会导致高假阳性率和未知的假阴性率)。本章旨在解释用于分析CNV的计算方法,从研究设计、数据处理和质量控制,到与临床特征的全基因组关联研究。