Institute of Biotechnology, University of Helsinki, FI-00014 Helsinki, Finland.
Genome Res. 2022 Aug 25;32(8):1437-1447. doi: 10.1101/gr.276478.121.
Variation within human genomes is unevenly distributed, and variants show spatial clustering. DNA replication-related template switching is a poorly known mutational mechanism capable of causing major chromosomal rearrangements as well as creating short inverted sequence copies that appear as local mutation clusters in sequence comparisons. In this study, haplotype-resolved genome assemblies representing 25 human populations and multinucleotide variants aggregated from 140,000 human sequencing experiments were reanalyzed. Local template switching could explain thousands of complex mutation clusters across the human genome, the loci segregating within and between populations. During the study, computational tools were developed for identification of template switch events using both short-read sequencing data and genotype data, and for genotyping candidate loci using short-read data. The characteristics of template-switch mutations complicate their detection, and widely used analysis pipelines for short-read sequencing data, normally capable of identifying single nucleotide changes, were found to miss template-switch mutations of tens of base pairs, potentially invalidating medical genetic studies searching for a causative allele behind genetic diseases. Combined with the massive sequencing data now available for humans, the novel tools described here enable building catalogs of affected loci and studying the cellular mechanisms behind template switching in both healthy organisms and disease.
人类基因组中的变异分布不均匀,并且变异具有空间聚类性。与 DNA 复制相关的模板转换是一种鲜为人知的突变机制,它能够导致重大的染色体重排,并产生短的反向序列副本,这些副本在序列比较中表现为局部突变簇。在这项研究中,重新分析了代表 25 个人类群体的单倍型分辨率基因组组装和从 140,000 个人类测序实验中聚集的多核苷酸变体。局部模板转换可以解释数千个复杂的突变簇,这些突变簇在人类基因组、群体内和群体间的基因座中分离。在研究过程中,开发了用于使用短读测序数据和基因型数据识别模板转换事件的计算工具,以及用于使用短读数据对候选基因座进行基因分型的计算工具。模板转换突变的特征使其难以检测,并且通常能够识别单核苷酸变化的常用短读测序数据分析管道被发现会错过数十个碱基对的模板转换突变,这可能会使正在寻找遗传疾病背后致病等位基因的医学遗传研究无效。结合现在可用于人类的大量测序数据,这里描述的新工具能够构建受影响基因座的目录,并研究健康生物体和疾病中模板转换背后的细胞机制。