Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain; Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, CDMX, Mexico.
Centro de Investigación en Matemáticas, AC (CIMAT), Guanajuato, Mexico.
Theor Popul Biol. 2023 Dec;154:94-101. doi: 10.1016/j.tpb.2023.09.002. Epub 2023 Sep 22.
Multiple-merger coalescents, also known as Λ-coalescents, have been used to describe the genealogy of populations that have a skewed offspring distribution or that undergo strong selection. Inferring the characteristic measure Λ, which describes the rates of the multiple-merger events, is key to understand these processes. So far, most inference methods only work for some particular families of Λ-coalescents that are described by only one parameter, but not for more general models. This article is devoted to the construction of a non-parametric estimator of the density of Λ that is based on the observation at a single time of the so-called Site Frequency Spectrum (SFS), which describes the allelic frequencies in a present population sample. First, we produce estimates of the multiple-merger rates by solving a linear system, whose coefficients are obtained by appropriately subsampling the SFS. Then, we use a technique that aggregates the information extracted from the previous step through a kernel type of re-construction to give a non-parametric estimation of the measure Λ. We give a consistency result of this estimator under mild conditions on the behavior of Λ around 0. We also show some numerical examples of how our method performs.
多重合并 coalescents,也称为 Λ-coalescents,已被用于描述具有偏斜后代分布或经历强烈选择的群体的谱系。推断描述多重合并事件速率的特征度量 Λ 是理解这些过程的关键。到目前为止,大多数推断方法仅适用于由单个参数描述的某些特定的 Λ-coalescents 族,而不适用于更一般的模型。本文致力于构建基于所谓的单时间点位点频率谱 (SFS) 的 Λ 密度的非参数估计器,该谱描述了当前群体样本中的等位基因频率。首先,我们通过求解线性系统来产生多重合并速率的估计,其系数是通过适当的 SFS 抽样获得的。然后,我们使用一种通过核类型重建从前面步骤中提取信息的技术,对度量 Λ 进行非参数估计。我们在 Λ 围绕 0 的行为的一些温和条件下给出了这个估计器的一致性结果。我们还展示了一些数值示例,说明我们的方法的性能。