Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.
Science. 2010 Oct 29;330(6004):641-6. doi: 10.1126/science.1197005.
Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrated accurate estimation of absolute copy number for duplications as small as 1.9 kilobase pairs, ranging from 0 to 48 copies. We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies and used them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.
拷贝数变异既影响疾病也影响正常表型变异,但那些位于高度重复、高度同源序列内的变异一直难以检测。通过分析 159 个人类基因组的短读序列映射深度,我们证明了对小至 1.9 千碱基对的重复序列的绝对拷贝数的精确估计,重复数从 0 到 48 份不等。我们确定了 410 万个“单独特异核苷酸”位置,这些位置在区分特定拷贝数方面具有信息性,并利用它们对高度重复基因家族中的特定基因家族的拷贝数和内容进行基因分型。这些数据鉴定了与大脑发育相关的基因中的人类特异性扩张,揭示了广泛的群体遗传多样性,并检测到与人类物种中基因转换一致的特征。我们的方法使大约 1000 个基因可用于疾病关联的遗传研究。