Dong Jiaqiang, Feng Yaping, Kumar Dibyendu, Zhang Wei, Zhu Tingting, Luo Ming-Cheng, Messing Joachim
Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854;
Department of Plant Sciences, University of California, Davis, CA 95616.
Proc Natl Acad Sci U S A. 2016 Jul 19;113(29):7949-56. doi: 10.1073/pnas.1608775113. Epub 2016 Jun 27.
Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.
单倍型变异不仅涉及单核苷酸多态性(SNP),还包括插入和缺失,特别是基因拷贝数变异。然而,个体基因组的比较一直很困难,因为传统测序方法获得的读长过短,无法明确重建包含重复DNA序列的染色体区域。玉米中的蛋白质基因家族就是这样一个例子,它在种子中作为还原态氮的储存库。此前,在两个爱荷华硬秆(SS)自交系中,已描述了α-玉米醇溶蛋白基因家族的41 - 48个基因拷贝,这些拷贝分布在跨越30 - 500 kb染色体区域的六个位点上。由于重叠细菌人工染色体(BAC)克隆,这些区域的分析才得以进行,而BAC克隆是通过一种昂贵且费力的方法产生的。在这里,我们使用单分子实时(Pacific Biosciences)鸟枪法测序,从单个DNA序列数据集中组装了非硬秆玉米自交系W22的六个染色体区域。为了验证重建区域,我们构建了W22的光学图谱(BioNano基因组图谱;BioNano Genomics),并发现两个数据集之间具有一致性。利用W22全长cDNA的序列,我们发现PacBio测序在自动校正和组装后的错误率似乎低于0.1%。表达的基因,有些带有提前终止密码子,与未表达的基因相间分布,导致基因型特异性的表达差异。将这些区域与之前分析的SS系区域进行比对,结果显示这两个杂种优势群之间部分存在显著差异。