Saxena Ayush Shekhar, Baer Charles F
Department of Biology, University of Florida, Gainesville, FL, USA.
Present address - Regeneron Pharmaceuticals Inc., Tarrytown, NY, USA.
bioRxiv. 2025 Mar 25:2025.03.22.644739. doi: 10.1101/2025.03.22.644739.
The importance of genomic structural variants (SVs) is well-appreciated, but much less is known about their mutational properties than of single nucleotide variants (SNVs) and short indels. The reason is simple: the longer the mutation, the less likely it will be covered by a single sequencing read, thus the harder it is to map unambiguously to a unique genomic location. Here we report SV mutation rate estimates from six mutation accumulation (MA) lines from two strains of (N2 and PB306) using long-read (PacBio) sequencing. The inferred SV mutation rate 1/10 the SNV rate and ~1/4 the short indel rate. We identified 40 mutations, and removed 52 false positives (FP) by manual inspection of each SV call. Excluding one atypical line (5 mutations, 35 FPs), the signal (inferred mutant) to noise (FP) ratio is approximately 2:1. False negative rates were determined by simulating variants in the reference genome, and observing 'recall'. Recall rate ranges from >90% for short indels and declines as SV length increases. Small deletions have nearly the same recall rate as small insertions (100bp), but deletions have higher recall rates than insertions as size increases. The reported SV mutation rate is likely an underestimate. A quarter of identified SV mutations occur in SV hotspots that harbor pre-existing low complexity repeat variation. By comparison of the spectrum of spontaneous SVs to wild isolates, we infer that natural selection is not only efficient at removing SVs in exons, but also removes roughly half of SVs in intergenic regions.
基因组结构变异(SVs)的重要性已得到充分认识,但相较于单核苷酸变异(SNVs)和短插入缺失,人们对其突变特性的了解要少得多。原因很简单:突变越长,被单个测序读数覆盖的可能性就越小,因此也就越难明确地定位到唯一的基因组位置。在此,我们使用长读长(PacBio)测序报告了来自两种品系(N2和PB306)的六个突变积累(MA)品系的SV突变率估计值。推断出的SV突变率约为SNV率的1/10,短插入缺失率的1/4。我们鉴定出40个突变,并通过人工检查每个SV调用去除了52个假阳性(FP)。排除一个非典型品系(5个突变,35个FP)后,信号(推断的突变体)与噪声(FP)的比率约为2:1。通过在参考基因组中模拟变异并观察“召回率”来确定假阴性率。召回率范围从短插入缺失的>90%开始,并随着SV长度的增加而下降。小缺失的召回率与小插入(~100bp)几乎相同,但随着大小增加,缺失的召回率高于插入。所报告的SV突变率可能被低估了。四分之一已鉴定的SV突变发生在存在低复杂性重复变异的SV热点区域。通过将自发SV的频谱与野生分离株进行比较,我们推断自然选择不仅能有效地去除外显子中的SV,还能去除基因间区域中大约一半的SV。