Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany.
Genome Res. 2011 Oct;21(10):1672-85. doi: 10.1101/gr.125047.111. Epub 2011 Aug 3.
Independent determination of both haplotype sequences of an individual genome is essential to relate genetic variation to genome function, phenotype, and disease. To address the importance of phase, we have generated the most complete haplotype-resolved genome to date, "Max Planck One" (MP1), by fosmid pool-based next generation sequencing. Virtually all SNPs (>99%) and 80,000 indels were phased into haploid sequences of up to 6.3 Mb (N50 ~1 Mb). The completeness of phasing allowed determination of the concrete molecular haplotype pairs for the vast majority of genes (81%) including potential regulatory sequences, of which >90% were found to be constituted by two different molecular forms. A subset of 159 genes with potentially severe mutations in either cis or trans configurations exemplified in particular the role of phase for gene function, disease, and clinical interpretation of personal genomes (e.g., BRCA1). Extended genomic regions harboring manifold combinations of physically and/or functionally related genes and regulatory elements were resolved into their underlying "haploid landscapes," which may define the functional genome. Moreover, the majority of genes and functional sequences were found to contain individual or rare SNPs, which cannot be phased from population data alone, emphasizing the importance of molecular phasing for characterizing a genome in its molecular individuality. Our work provides the foundation to understand that the distinction of molecular haplotypes is essential to resolve the (inherently individual) biology of genes, genomes, and disease, establishing a reference point for "phase-sensitive" personal genomics. MP1's annotated haploid genomes are available as a public resource.
独立确定个体基因组的两个单倍型序列对于将遗传变异与基因组功能、表型和疾病相关联至关重要。为了解决相位的重要性,我们通过基于fosmid 池的下一代测序生成了迄今为止最完整的单倍型解析基因组,称为“马克斯·普朗克一号”(MP1)。几乎所有的 SNP(>99%)和 80000 个插入缺失都被相位到长达 6.3Mb 的单倍体序列中(N50~1Mb)。相位的完整性允许确定绝大多数基因(81%)的具体分子单倍型对,包括潜在的调节序列,其中>90%由两种不同的分子形式构成。具有顺式或反式潜在严重突变的 159 个基因子集特别说明了相位对基因功能、疾病和个人基因组临床解释的作用,例如 BRCA1。包含物理和/或功能相关基因和调节元件的多种组合的扩展基因组区域被解析为其潜在的“单倍型景观”,这可能定义了功能基因组。此外,大多数基因和功能序列都被发现含有个体或罕见的 SNP,这些 SNP 不能仅从群体数据中相位,强调了分子相位对于描述基因组的分子个体性的重要性。我们的工作为理解分子单倍型的区别对于解决基因、基因组和疾病的(固有个体)生物学至关重要奠定了基础,为“相位敏感”的个人基因组学建立了参考点。MP1 注释的单倍型基因组可作为公共资源使用。