Wang Xiaobin, Sui Weiguo, Wu Weiqing, Hou Xianliang, Ou Minglin, Xiang Yueying, Dai Yong
Health Management Centre, The Affiliated Guilin Hospital, Southern Medical University, Guilin, Guangxi 541000, P.R. China; Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China.
Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China.
Exp Ther Med. 2016 Nov;12(5):3143-3150. doi: 10.3892/etm.2016.3797. Epub 2016 Oct 11.
With the advent of next-generation sequencing technology, the cost of sequencing has significantly decreased. However, sequencing costs remain high for large-scale studies. In the present study, DNA pooling was applied as a cost-effective strategy for sequencing. The sequencing results for 100 healthy individuals obtained via whole-genome resequencing and using DNA pooling are presented in the present study. In order to minimise the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for all samples and then subjected to whole-genome sequencing using four lanes for each library and resulting in at least a 30-fold haploid coverage for each sample. The NCBI human genome build37 (hg19) was used as a reference genome for the present study and the short reads were aligned to the reference genome achieving 99.84% coverage. In addition, the average sequencing depth was 32.76. In total, ~3 million single-nucleotide polymorphisms were identified, of which 99.88% were in the NCBI dbSNP database. Furthermore, ~600,000 small insertion/deletions, 500,000 structure variants, 5,000 copy number variations and 13,000 single nucleotide variants were identified. According to the present study, the whole genome has been sequenced for a small sample subjects from southern China for the first time. Furthermore, new variation sites were identified by comparing with the reference sequence, and new knowledge of the human genome variation was added to the human genomic databases. Furthermore, the particular distribution regions of variation were illustrated by analyzing various sites of variation, such as single-nucleotide polymorphisms.
随着下一代测序技术的出现,测序成本显著降低。然而,对于大规模研究而言,测序成本仍然很高。在本研究中,DNA混合池被用作一种具有成本效益的测序策略。本研究展示了通过全基因组重测序和使用DNA混合池获得的100名健康个体的测序结果。为了将采样中系统偏差的可能性降至最低,为所有样本制备了插入片段大小为500 bp的双端文库,然后每个文库使用四个泳道进行全基因组测序,每个样本至少获得30倍的单倍体覆盖率。本研究使用NCBI人类基因组构建版本37(hg19)作为参考基因组,短读长与参考基因组进行比对,覆盖率达到99.84%。此外,平均测序深度为32.76。总共鉴定出约300万个单核苷酸多态性,其中99.88%在NCBI的dbSNP数据库中。此外,还鉴定出约60万个小插入/缺失、50万个结构变异、5000个拷贝数变异和13000个单核苷酸变异。根据本研究,首次对来自中国南方的一小群受试者进行了全基因组测序。此外,通过与参考序列比较鉴定出新的变异位点,并将人类基因组变异的新知识添加到人类基因组数据库中。此外,通过分析各种变异位点,如单核苷酸多态性,阐明了变异的特定分布区域。