Suppr超能文献

随机游走重采样在系统发育隐马尔可夫模型推断和学习中的应用。

An Application of Random Walk Resampling to Phylogenetic HMM Inference and Learning.

出版信息

IEEE Trans Nanobioscience. 2020 Jul;19(3):506-517. doi: 10.1109/TNB.2020.2991302. Epub 2020 May 8.

Abstract

Statistical resampling methods are widely used for confidence interval placement and as a data perturbation technique for statistical inference and learning. An important assumption of popular resampling methods such as the standard bootstrap is that input observations are identically and independently distributed (i.i.d.). However, within the area of computational biology and bioinformatics, many different factors can contribute to intra-sequence dependence, such as recombination and other evolutionary processes governing sequence evolution. The SEquential RESampling ("SERES") framework was previously proposed to relax the simplifying assumption of i.i.d. input observations. SERES resampling takes the form of random walks on an input of either aligned or unaligned biomolecular sequences. This study introduces the first application of SERES random walks on aligned sequence inputs and is also the first to demonstrate the utility of SERES as a data perturbation technique to yield improved statistical estimates. We focus on the classical problem of recombination-aware local genealogical inference. We show in a simulation study that coupling SERES resampling and re-estimation with recHMM, a hidden Markov model-based method, produces local genealogical inferences with consistent and often large improvements in terms of topological accuracy. We further evaluate method performance using empirical HIV genome sequence datasets.

摘要

统计重采样方法被广泛应用于置信区间的确定以及统计推断和学习中的数据扰动技术。标准自助法等流行的重采样方法的一个重要假设是输入观测值是独立同分布的(iid)。然而,在计算生物学和生物信息学领域,许多不同的因素可能导致序列内的依赖性,例如重组和其他控制序列进化的进化过程。以前提出了 SEquential RESampling(“SERES”)框架来放宽输入观测值独立同分布的简化假设。SERES 重采样采用对齐或未对齐生物分子序列输入的随机游走形式。本研究首次将 SERES 随机游走应用于对齐序列输入,并首次证明 SERES 作为数据扰动技术的实用性,可以产生改进的统计估计。我们专注于具有重组意识的局部系统发育推断的经典问题。我们在一项模拟研究中表明,将 SERES 重采样和再估计与基于隐马尔可夫模型的 recHMM 相结合,可以产生具有一致且通常在拓扑准确性方面有较大改进的局部系统发育推断。我们进一步使用经验 HIV 基因组序列数据集评估方法性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验