Gaynor Michelle L, Landis Jacob B, O'Connor Timothy K, Laport Robert G, Doyle Jeff J, Soltis Douglas E, Ponciano José Miguel, Soltis Pamela S
Florida Museum of Natural History University of Florida Gainesville 32611 Florida USA.
Department of Biology University of Florida Gainesville 32611 Florida USA.
Appl Plant Sci. 2024 Jul 14;12(4):e11606. doi: 10.1002/aps3.11606. eCollection 2024 Jul-Aug.
Traditional methods of ploidal-level estimation are tedious; using DNA sequence data for cytotype estimation is an ideal alternative. Multiple statistical approaches to leverage sequence data for ploidy inference based on site-based heterozygosity have been developed. However, these approaches may require high-coverage sequence data, use inappropriate probability distributions, or have additional statistical shortcomings that limit inference abilities. We introduce nQuack, an open-source R package that addresses the main shortcomings of current methods.
nQuack performs model selection for improved ploidy predictions. Here, we implement expectation maximization algorithms with normal, beta, and beta-binomial distributions. Using extensive computer simulations that account for variability in sequencing depth, as well as real data sets, we demonstrate the utility and limitations of nQuack.
Inferring ploidy based on site-based heterozygosity alone is difficult. Even though nQuack is more accurate than similar methods, we suggest caution when relying on any site-based heterozygosity method to infer ploidy.
传统的倍性水平估计方法繁琐;利用DNA序列数据进行细胞型估计是一种理想的替代方法。已经开发了多种基于位点杂合性利用序列数据进行倍性推断的统计方法。然而,这些方法可能需要高覆盖度的序列数据,使用不适当的概率分布,或者存在其他限制推断能力的统计缺陷。我们引入了nQuack,一个解决当前方法主要缺点的开源R包。
nQuack进行模型选择以改进倍性预测。在这里,我们实现了具有正态分布、贝塔分布和贝塔二项分布的期望最大化算法。通过考虑测序深度变异性的广泛计算机模拟以及真实数据集,我们展示了nQuack的实用性和局限性。
仅基于位点杂合性推断倍性是困难的。尽管nQuack比类似方法更准确,但我们建议在依靠任何基于位点杂合性的方法推断倍性时要谨慎。