Sankoff David, Zheng Chunfang, Wang Baoyong, Abad Najar Carlos
BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S9. doi: 10.1186/1471-2105-16-S17-S9. Epub 2015 Dec 7.
The loss of duplicate genes - fractionation - after whole genome doubling (WGD) is the subject to a debate as to whether it proceeds gene by gene or through deletion of multi-gene chromosomal segments.
WGD produces two copies of every chromosome, namely two identical copies of a sequence of genes. We assume deletion events excise a geometrically distributed number of consecutive genes with mean µ ≥ 1, and these events can combine to produce single-copy runs of length l. If µ = 1, the process is gene-by-gene. If µ > 1, the process at least occasionally excises more than one gene at a time. In the latter case if deletions overlap, the later one simply extends the existing run of single-copy genes. We explore aspects of the predicted distribution of the lengths of single-copy regions analytically, but resort to simulations to show how observing run lengths l allows us to discriminate between the two hypotheses.
Deletion run length distributions can discriminate between gene-by-gene fractionation and deletion of segments of geometrically distributed length, even if µ is only slightly larger than 1, as long as the genome is large enough and fractionation has not proceeded too far towards completion.
全基因组加倍(WGD)后重复基因的丢失——基因分离——是一个存在争议的问题,即它是逐个基因进行,还是通过删除多基因染色体片段来进行。
WGD产生每条染色体的两个拷贝,即基因序列的两个相同拷贝。我们假设删除事件切除的连续基因数量呈几何分布,平均μ≥1,并且这些事件可以组合产生长度为l的单拷贝序列。如果μ = 1,该过程是逐个基因进行的。如果μ>1,该过程至少偶尔会一次切除多个基因。在后一种情况下,如果删除重叠,后面的删除只会延长现有的单拷贝基因序列。我们通过分析探索了单拷贝区域长度预测分布的各个方面,但借助模拟来展示观察序列长度l如何使我们能够区分这两种假设。
删除序列长度分布可以区分逐个基因的分离和几何分布长度片段的删除,即使μ仅略大于1,只要基因组足够大且分离尚未进行到接近完成的程度。