National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
Arizona Genomics Institute and BIO5 Institute, School of Plant Sciences, University of Arizona, Tucson, Arizona 85721, USA.
Sci Data. 2016 Sep 13;3:160076. doi: 10.1038/sdata.2016.76.
Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ~97 (ZS97) and ~74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.
在过去的 30 年中,我们对两个水稻亚种 indica 品种,珍汕 97(ZS97)和明恢 63(MH63)进行了许多基础研究。为了提高许多这些研究的分辨率,我们使用最先进的测序技术生成了两个参考质量的参考基因组组装。我们使用 PacBio SMRT 技术,从 166 个 ZS97 和 209 个 MH63 BAC 克隆池生成了超过 108 和 174 Gb 的原始序列数据,并使用 Illumina 测序技术生成了约 97 和 74 Gb 的配对末端全基因组鸟枪法(WGS)序列数据。有了这些数据,我们成功组装了两个已公开发布的白金标准参考基因组。在这里,我们提供了用于生成这两个参考基因组组装的完整原始数据集。这些数据集可用于测试新的程序,以实现更好的基因组组装和注释,帮助发现基因组结构、功能和进化的新见解,并为一般的生物学研究提供必要的支持。