State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Institute of Plant Biology, Center for Evolutionary Biology, School of Life Sciences, andMinistry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Institute of Biodiversity Sciences, Fudan University, Shanghai 200433, China;
Department of Biology and the Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC 27599-3280;Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, NC 27599-3280; and.
Proc Natl Acad Sci U S A. 2014 Jul 8;111(27):10007-12. doi: 10.1073/pnas.1321897111. Epub 2014 Jun 23.
DNA polymorphisms are important markers in genetic analyses and are increasingly detected by using genome resequencing. However, the presence of repetitive sequences and structural variants can lead to false positives in the identification of polymorphic alleles. Here, we describe an analysis strategy that minimizes false positives in allelic detection and present analyses of recently published resequencing data from Arabidopsis meiotic products and individual humans. Our analysis enables the accurate detection of sequencing errors, small insertions and deletions (indels), and structural variants, including large reciprocal indels and copy number variants, from comparisons between the resequenced and reference genomes. We offer an alternative interpretation of the sequencing data of meiotic products, including the number and type of recombination events, to illustrate the potential for mistakes in single-nucleotide polymorphism calling. Using these examples, we propose that the detection of DNA polymorphisms using resequencing data needs to account for nonallelic homologous sequences.
DNA 多态性是遗传分析中的重要标记,越来越多地通过基因组重测序来检测。然而,重复序列和结构变异的存在可能导致多态等位基因的假阳性识别。在这里,我们描述了一种分析策略,可最大限度地减少等位基因检测中的假阳性,并展示了最近发表的拟南芥减数分裂产物和个体人类重测序数据的分析结果。我们的分析能够从重测序和参考基因组之间的比较中准确检测测序错误、小插入和缺失 (indels) 以及结构变异,包括大的相互反向 indels 和拷贝数变异。我们提供了减数分裂产物测序数据的另一种解释,包括重组事件的数量和类型,以说明单核苷酸多态性调用中的错误的可能性。使用这些示例,我们提出使用重测序数据检测 DNA 多态性需要考虑非等位基因同源序列。