Wang Xiaoqiang, Lebarbier Emilie, Aubert Julie, Robin Stéphane
School of Mathematics and Statistics, Shandong University (Weihai), Weihai,Shandong, China.
UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France.
Int J Biostat. 2019 Feb 19;15(1):/j/ijb.2019.15.issue-1/ijb-2018-0023/ijb-2018-0023.xml. doi: 10.1515/ijb-2018-0023.
Hidden Markov models provide a natural statistical framework for the detection of the copy number variations (CNV) in genomics. In this context, we define a hidden Markov process that underlies all individuals jointly in order to detect and to classify genomics regions in different states (typically, deletion, normal or amplification). Structural variations from different individuals may be dependent. It is the case in agronomy where varietal selection program exists and species share a common phylogenetic past. We propose to take into account these dependencies inthe HMM model. When dealing with a large number of series, maximum likelihood inference (performed classically using the EM algorithm) becomes intractable. We thus propose an approximate inference algorithm based on a variational approach (VEM), implemented in the CHMM R package. A simulation study is performed to assess the performance of the proposed method and an application to the detection of structural variations in plant genomes is presented.
隐马尔可夫模型为基因组学中拷贝数变异(CNV)的检测提供了一个自然的统计框架。在此背景下,我们定义一个隐马尔可夫过程,该过程共同构成所有个体的基础,以便检测和分类处于不同状态(通常为缺失、正常或扩增)的基因组区域。不同个体的结构变异可能是相关的。在存在品种选择计划且物种具有共同系统发育历史的农学领域就是这种情况。我们建议在隐马尔可夫模型中考虑这些相关性。当处理大量序列时,最大似然推断(通常使用期望最大化算法进行)变得难以处理。因此,我们提出一种基于变分方法(VEM)的近似推断算法,该算法在CHMM R包中实现。进行了一项模拟研究以评估所提出方法的性能,并展示了其在植物基因组结构变异检测中的应用。