Department of Mathematics and Statistics, American University, Washington, DC 20016, USA.
Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA.
Bioinformatics. 2020 Mar 1;36(6):1795-1800. doi: 10.1093/bioinformatics/btz852.
Empirical Bayes techniques to genotype polyploid organisms usually either (i) assume technical artifacts are known a priori or (ii) estimate technical artifacts simultaneously with the prior genotype distribution. Case (i) is unappealing as it places the onus on the researcher to estimate these artifacts, or to ensure that there are no systematic biases in the data. However, as we demonstrate with a few empirical examples, case (ii) makes choosing the class of prior genotype distributions extremely important. Choosing a class is either too flexible or too restrictive results in poor genotyping performance.
We propose two classes of prior genotype distributions that are of intermediate levels of flexibility: the class of proportional normal distributions and the class of unimodal distributions. We provide a complete characterization of and optimization details for the class of unimodal distributions. We demonstrate, using both simulated and real data that using these classes results in superior genotyping performance.
Genotyping methods that use these priors are implemented in the updog R package available on the Comprehensive R Archive Network: https://cran.r-project.org/package=updog. All code needed to reproduce the results of this article is available on GitHub: https://github.com/dcgerard/reproduce_prior_sims.
Supplementary data are available at Bioinformatics online.
用于对多倍体生物进行基因分型的经验贝叶斯技术通常要么 (i) 假定技术伪影是先验已知的,要么 (ii) 同时估计技术伪影和先验基因型分布。情况 (i) 不太可取,因为它要求研究人员估计这些伪影,或者确保数据中没有系统偏差。然而,正如我们用一些经验示例所证明的那样,情况 (ii) 使得选择先验基因型分布的类别变得极其重要。选择一个过于灵活或过于严格的类别会导致基因分型性能不佳。
我们提出了两类具有中等灵活性的先验基因型分布:比例正态分布类和单峰分布类。我们提供了单峰分布类的完整特征描述和优化细节。我们使用模拟和真实数据证明,使用这些类别可获得更好的基因分型性能。
使用这些先验的基因分型方法已在 updog R 软件包中实现,可在 Comprehensive R Archive Network 上获得:https://cran.r-project.org/package=updog。本文结果所需的所有代码都可在 GitHub 上获得:https://github.com/dcgerard/reproduce_prior_sims。
补充数据可在 Bioinformatics 在线获得。