Lam Hugo Y K, Mu Xinmeng Jasmine, Stütz Adrian M, Tanzer Andrea, Cayting Philip D, Snyder Michael, Kim Philip M, Korbel Jan O, Gerstein Mark B
Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.
Nat Biotechnol. 2010 Jan;28(1):47-55. doi: 10.1038/nbt.1600. Epub 2009 Dec 27.
Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.
结构变异(SVs)是人类基因组变异的主要来源;然而,在核苷酸分辨率水平上对其进行表征仍然具有挑战性。在这里,我们通过整理和标准化约2000个已发表的SVs,构建了一个核苷酸分辨率水平的断点文库。对于每个断点,我们推断其祖先状态(通过与灵长类基因组比较)及其形成机制(例如,非等位基因同源重组,NAHR)。我们从基因组标记、染色体位置、序列基序和物理特性方面对断点序列进行表征,发现插入和缺失的发生比之前报道的更为平衡,并且由NAHR形成的断点与相对刚性、稳定的DNA螺旋相关。最后,我们展示了一种名为BreakSeq的方法,用于将来自短读长测序基因组的 reads 与我们的断点文库进行比对,以准确识别之前被忽视的SVs,然后通过PCR进行验证。随着新数据的出现,我们预计我们的BreakSeq方法将变得更加灵敏,并有助于对个人基因组进行快速的SV基因分型。