Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
Sci Rep. 2013;3:2161. doi: 10.1038/srep02161.
The recent development of massively parallel sequencing technology has allowed the creation of comprehensive catalogs of genetic variation. However, due to the relatively high sequencing error rate for short read sequence data, sophisticated analysis methods are required to obtain high-quality variant calls. Here, we developed a probabilistic multinomial method for the detection of single nucleotide variants (SNVs) as well as short insertions and deletions (indels) in whole genome sequencing (WGS) and whole exome sequencing (WES) data for single sample calling. Evaluation with DNA genotyping arrays revealed a concordance rate of 99.98% for WGS calls and 99.99% for WES calls. Sanger sequencing of the discordant calls determined the false positive and false negative rates for the WGS (0.0068% and 0.17%) and WES (0.0036% and 0.0084%) datasets. Furthermore, short indels were identified with high accuracy (WGS: 94.7%, WES: 97.3%). We believe our method can contribute to the greater understanding of human diseases.
最近,高通量测序技术的发展使得创建全面的遗传变异目录成为可能。然而,由于短读序列数据的测序错误率相对较高,因此需要复杂的分析方法来获得高质量的变异呼叫。在这里,我们开发了一种用于检测全基因组测序(WGS)和全外显子组测序(WES)数据中单样本调用中单核苷酸变异(SNVs)以及短插入和缺失(indels)的概率多项式方法。使用 DNA 基因分型阵列进行评估,WGS 调用的一致性率为 99.98%,WES 调用的一致性率为 99.99%。对不一致调用的 Sanger 测序确定了 WGS(0.0068%和0.17%)和 WES(0.0036%和0.0084%)数据集的假阳性和假阴性率。此外,短插入缺失也被准确识别(WGS:94.7%,WES:97.3%)。我们相信,我们的方法可以帮助更好地理解人类疾病。