Nho Kwangsik, West John D, Li Huian, Henschel Robert, Bharthur Apoorva, Tavares Michel C, Saykin Andrew J
Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA ; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA.
Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA.
IEEE Int Conf Systems Biol. 2014 Oct;2014:59-62. doi: 10.1109/ISB.2014.6990432.
Rapid advancement of next-generation sequencing (NGS) technologies has facilitated the search for genetic susceptibility factors that influence disease risk in the field of human genetics. In particular whole genome sequencing (WGS) has been used to obtain the most comprehensive genetic variation of an individual and perform detailed evaluation of all genetic variation. To this end, sophisticated methods to accurately call high-quality variants and genotypes simultaneously on a cohort of individuals from raw sequence data are required. On chromosome 22 of 818 WGS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), which is the largest WGS related to a single disease, we compared two multi-sample variant calling methods for the detection of single nucleotide variants (SNVs) and short insertions and deletions (indels) in WGS: (1) reduce the analysis-ready reads (BAM) file to a manageable size by keeping only essential information for variant calling ("") and (2) call variants individually on each sample and then perform a joint genotyping analysis of the variant files produced for all samples in a cohort (""). identified 515,210 SNVs and 60,042 indels, while identified 358,303 SNVs and 52,855 indels. identified many more SNVs and indels compared to . Both methods had concordance rate of 99.60% for SNVs and 99.06% for indels. For SNVs, evaluation with HumanOmni 2.5M genotyping arrays revealed a concordance rate of 99.68% for and 99.50% for . needed more computational time and memory compared to . Our findings indicate that the multi-sample variant calling method using the process is a promising strategy for the variant detection, which should facilitate our understanding of the underlying pathogenesis of human diseases.
下一代测序(NGS)技术的快速发展推动了人类遗传学领域中影响疾病风险的遗传易感性因素的研究。特别是全基因组测序(WGS)已被用于获取个体最全面的遗传变异,并对所有遗传变异进行详细评估。为此,需要复杂的方法来从原始序列数据中准确地同时在一组个体上调用高质量变异和基因型。在阿尔茨海默病神经成像计划(ADNI)的818个WGS数据的22号染色体上(这是与单一疾病相关的最大规模WGS),我们比较了两种用于检测WGS中的单核苷酸变异(SNV)和短插入缺失(indel)的多样本变异调用方法:(1)通过仅保留变异调用所需的基本信息,将分析就绪读段(BAM)文件减少到可管理的大小(“”),以及(2)在每个样本上单独调用变异,然后对一组中所有样本生成的变异文件进行联合基因分型分析(“”)。“”识别出515,210个SNV和60,042个indel,而“”识别出358,303个SNV和52,855个indel。与“”相比,“”识别出更多的SNV和indel。两种方法对于SNV的一致率为99.60%,对于indel的一致率为99.06%。对于SNV,使用HumanOmni 2.5M基因分型阵列评估显示,“”的一致率为99.68%,“”的一致率为99.50%。与“”相比,“”需要更多的计算时间和内存。我们的研究结果表明,使用“”流程的多样本变异调用方法是一种有前景的变异检测策略,这将有助于我们理解人类疾病的潜在发病机制。