Huang Jie, Liang Xinming, Xuan Yuankai, Geng Chunyu, Li Yuxiang, Lu Haorong, Qu Shoufang, Mei Xianglin, Chen Hongbo, Yu Ting, Sun Nan, Rao Junhua, Wang Jiahao, Zhang Wenwei, Chen Ying, Liao Sha, Jiang Hui, Liu Xin, Yang Zhaopeng, Mu Feng, Gao Shangxian
National Institutes for food and drug Control (NIFDC), No.2, Tiantan Xili Dongcheng District, Beijing 10050, P. R. China.
BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, Guangdong Province, 518083, P. R. China.
Gigascience. 2017 May 1;6(5):1-9. doi: 10.1093/gigascience/gix024.
BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.
BGISEQ-500是华大基因研发的一款新型桌面式测序仪。它采用了源自Complete Genomics™测序技术的DNA纳米球和组合探针锚定合成技术,能够大规模生成短读长序列。在此,我们展示了BGISEQ-500的首个人类全基因组测序数据集。该数据集是通过对广泛使用的细胞系HG001(NA12878)进行两次双端50bp测序(PE50)和两次双端100bp测序(PE100)生成的。我们还提供了测序仪原始图像的示例以供参考。最后,我们使用该数据集鉴定变异,估计变异的准确性,并与从类似数量的公开可用HiSeq2500数据中鉴定出的变异进行比较。我们发现,与HiSeq2500的PE150数据(误报率[FPR]=0.00017%,灵敏度=96.60%)相比,BGISEQ-500的PE100数据单核苷酸多态性(SNP)检测准确性相似(FPR=0.00020%,灵敏度=96.20%),且SNP检测准确性优于PE50数据(FPR=0.0006%,灵敏度=94.15%)。但对于插入和缺失(indels),我们发现BGISEQ-500数据的准确性低于HiSeq2500数据(PE100和PE50的FPR分别为0.00069%和0.00067%,灵敏度分别为88.52%和70.93%)(HiSeq2500数据的FPR=0.00032%,灵敏度=96.28%)。我们的数据集可作为参考数据集,不仅为未来的发展,也为基于新测序平台的所有研究和应用提供基础信息。