Charmouh Anders Poulsen, Porsborg Peter Sørud, Hansen Lasse Thorup, Besenbacher Søren, Boeg Winge Sofia, Almstrup Kristian, Hobolth Asger, Bataillon Thomas, Schierup Mikkel Heide
Bioinformatics Research Centre, Aarhus University, University City 81, DK-8000 Aarhus C, Denmark.
Department of Mathematics, Aarhus University, Ny Munkegade 118, DK-8000 Aarhus C, Denmark.
Mol Biol Evol. 2025 Feb 3;42(2). doi: 10.1093/molbev/msaf019.
Gene conversions are broadly defined as the transfer of genetic material from a "donor" to an "acceptor" sequence and can happen both in meiosis and mitosis. They are a subset of noncrossover (NCO) events and, like crossover (CO) events, gene conversion can generate new combinations of alleles and counteract mutation load by reverting germline mutations through GC-biased gene conversion. Estimating gene conversion rate and the distribution of gene conversion tract lengths remains challenging. We present a new method for estimating tract length, rate, and detection probability of NCO events directly in HiFi PacBio long read data. The method can be used to make inference from sequencing of gametes from a single individual. The method is unbiased even under low single nucleotide variant (SNV) densities and does not necessitate any demographic or evolutionary assumptions. We test the accuracy and robustness of our method using simulated datasets where we vary length of tracts, number of tracts, the genomic SNV density, and levels of correlation between SNV density and NCO event position. Our simulations show that under low SNV densities, like those found in humans, only a minute fraction (∼2%) of NCO events are expected to become visible as gene conversions by moving at least 1 SNV. We finally illustrate our method by applying it to PacBio sequencing data from human sperm.
基因转换被广泛定义为遗传物质从“供体”序列转移到“受体”序列的过程,在减数分裂和有丝分裂中均可发生。它们是非交叉(NCO)事件的一个子集,并且与交叉(CO)事件一样,基因转换可以产生新的等位基因组合,并通过偏向GC的基因转换恢复生殖系突变来抵消突变负荷。估计基因转换率和基因转换片段长度的分布仍然具有挑战性。我们提出了一种新方法,可直接从HiFi PacBio长读长数据中估计NCO事件的片段长度、发生率和检测概率。该方法可用于从单个个体的配子测序中进行推断。即使在低单核苷酸变异(SNV)密度下,该方法也是无偏的,并且不需要任何人口统计学或进化假设。我们使用模拟数据集测试了我们方法的准确性和稳健性,在这些数据集中,我们改变了片段长度、片段数量、基因组SNV密度以及SNV密度与NCO事件位置之间的相关性水平。我们的模拟表明,在低SNV密度下,如在人类中发现的那样,预计只有极小一部分(约2%)的NCO事件会通过移动至少1个SNV而作为基因转换变得可见。我们最后通过将其应用于人类精子的PacBio测序数据来说明我们的方法。