Suppr超能文献

一种用于估计与嵌套祖先层相关的纯合子通过血缘概率的隐马尔可夫模型。

A hidden Markov model to estimate homozygous-by-descent probabilities associated with nested layers of ancestors.

机构信息

Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège, Liège, Belgium.

INRAE, UMR CBGP (INRAE-IRD-Cirad-Montpellier SupAgro), Montferrier-sur-Lez, France.

出版信息

Theor Popul Biol. 2022 Jun;145:38-51. doi: 10.1016/j.tpb.2022.03.001. Epub 2022 Mar 10.

Abstract

Inbreeding results from the mating of related individuals and has negative consequences because it brings together deleterious variants in one individual. Genomic estimates of the inbreeding coefficients are preferred to pedigree-based estimators as they measure the realized inbreeding levels and they are more robust to pedigree errors. Several methods identifying homozygous-by-descent (HBD) segments with hidden Markov models (HMM) have been recently developed and are particularly valuable when the information is degraded or heterogeneous (e.g., low-fold sequencing, low marker density, heterogeneous genotype quality or variable marker spacing). We previously developed a multiple HBD class HMM where HBD segments are classified in different groups based on their length (e.g., recent versus old HBD segments) but we recently observed that for high inbreeding levels with many HBD segments, the estimated contributions might be biased towards more recent classes (i.e., associated with large HBD segments) although the overall estimated level of inbreeding remained unbiased. We herein propose a new model in which the HBD classification is modelled in successive nested levels with decreasing expected HBD segment lengths, the underlying exponential rates being directly related to the number of generations to the common ancestor. The non-HBD classes are now modelled as a mixture of HBD segments from later generations and shorter non-HBD segments (i.e., both with higher rates). The new model has improved statistical properties and performs better on simulated data compared to our previous version. We also show that the parameters of the model are easier to interpret and that the model is more robust to the choice of the number of classes. Overall, the new model results in an improved partitioning of inbreeding in different HBD classes and should be preferred.

摘要

近亲繁殖是指相关个体之间的交配,它会带来负面影响,因为它会将有害的变异聚集在一个个体中。与基于系谱的估计器相比,基因组估计的近交系数更受欢迎,因为它们衡量的是实际的近交水平,并且对系谱错误更稳健。最近已经开发了几种使用隐马尔可夫模型 (HMM) 识别纯合子相关 (HBD) 片段的方法,当信息退化或异质时(例如,低倍测序、低标记密度、异质基因型质量或可变标记间距),这些方法特别有价值。我们之前开发了一个多 HBD 类 HMM,其中 HBD 片段根据其长度(例如,近期与旧 HBD 片段)分类到不同的组中,但我们最近观察到,对于具有许多 HBD 片段的高近交水平,估计的贡献可能偏向于更近的类(即,与大 HBD 片段相关),尽管整体估计的近交水平仍然没有偏差。我们在此提出了一个新模型,其中 HBD 分类在连续嵌套级别中建模,HBD 片段的预期长度逐渐减小,基础指数率与到共同祖先的世代数直接相关。现在,非 HBD 类被建模为来自较晚世代的 HBD 片段和较短的非 HBD 片段(即,两者的速率都较高)的混合物。与我们之前的版本相比,新模型具有改进的统计特性,并在模拟数据上表现更好。我们还表明,模型的参数更容易解释,并且模型对类数的选择更稳健。总体而言,新模型导致不同 HBD 类别的近交更好地划分,应该优先选择。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验