Betrán E, Rozas J, Navarro A, Barbadilla A
Departament de Genètica i de Microbiologia, Facultat de Ciències, Universitat Autònoma de Barcelona, Bellaterra, Spain.
Genetics. 1997 May;146(1):89-99. doi: 10.1093/genetics/146.1.89.
DNA sequence variation studies report the transfer of small segments of DNA among different sequences caused by gene conversion events. Here, we provide an algorithm to detect gene conversion tracts and a statistical model to estimate the number and the length distribution of conversion tracts for population DNA sequence data. Two length distributions are defined in the model: (1) that of the observed tract lengths and (2) that of the true tract lengths. If the latter follows a geometric distribution, the relationship between both distributions depends on two basic parameters: psi, which measures the probability of detecting a converted site, and phi, the parameter of the geometric distribution, from which the average true tract length, 1/(1-phi), can be estimated. Expressions are provided for estimating phi by the method of the moments and that of the maximum likelihood. The robustness of the model is examined by computer simulation. The present methods have been applied to the published rp49 sequences of Drosophila subobscura. Maximum likelihood estimate of phi for this data set is 0.9918, which represents an average conversion tract length of 122 bp. Only a small percentage of extant conversion events is detected.
DNA序列变异研究报告了由基因转换事件导致的不同序列间小片段DNA的转移。在此,我们提供一种用于检测基因转换片段的算法以及一个统计模型,以估计群体DNA序列数据中转换片段的数量和长度分布。该模型定义了两种长度分布:(1)观察到的片段长度分布;(2)真实片段长度分布。如果后者遵循几何分布,那么这两种分布之间的关系取决于两个基本参数:ψ,用于衡量检测到转换位点的概率;以及几何分布的参数φ,由此可估计平均真实片段长度1/(1 - φ)。提供了通过矩估计法和最大似然估计法来估计φ的表达式。通过计算机模拟检验了该模型的稳健性。目前的方法已应用于已发表的果蝇亚暗果蝇的rp49序列。该数据集的φ的最大似然估计值为0.9918,这代表平均转换片段长度为122 bp。仅检测到一小部分现存的转换事件。