Suppr超能文献

多倍体基因型的先验信息。

Priors for genotyping polyploids.

机构信息

Department of Mathematics and Statistics, American University, Washington, DC 20016, USA.

Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA.

出版信息

Bioinformatics. 2020 Mar 1;36(6):1795-1800. doi: 10.1093/bioinformatics/btz852.

Abstract

MOTIVATION

Empirical Bayes techniques to genotype polyploid organisms usually either (i) assume technical artifacts are known a priori or (ii) estimate technical artifacts simultaneously with the prior genotype distribution. Case (i) is unappealing as it places the onus on the researcher to estimate these artifacts, or to ensure that there are no systematic biases in the data. However, as we demonstrate with a few empirical examples, case (ii) makes choosing the class of prior genotype distributions extremely important. Choosing a class is either too flexible or too restrictive results in poor genotyping performance.

RESULTS

We propose two classes of prior genotype distributions that are of intermediate levels of flexibility: the class of proportional normal distributions and the class of unimodal distributions. We provide a complete characterization of and optimization details for the class of unimodal distributions. We demonstrate, using both simulated and real data that using these classes results in superior genotyping performance.

AVAILABILITY AND IMPLEMENTATION

Genotyping methods that use these priors are implemented in the updog R package available on the Comprehensive R Archive Network: https://cran.r-project.org/package=updog. All code needed to reproduce the results of this article is available on GitHub: https://github.com/dcgerard/reproduce_prior_sims.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

用于对多倍体生物进行基因分型的经验贝叶斯技术通常要么 (i) 假定技术伪影是先验已知的,要么 (ii) 同时估计技术伪影和先验基因型分布。情况 (i) 不太可取,因为它要求研究人员估计这些伪影,或者确保数据中没有系统偏差。然而,正如我们用一些经验示例所证明的那样,情况 (ii) 使得选择先验基因型分布的类别变得极其重要。选择一个过于灵活或过于严格的类别会导致基因分型性能不佳。

结果

我们提出了两类具有中等灵活性的先验基因型分布:比例正态分布类和单峰分布类。我们提供了单峰分布类的完整特征描述和优化细节。我们使用模拟和真实数据证明,使用这些类别可获得更好的基因分型性能。

可用性和实现

使用这些先验的基因分型方法已在 updog R 软件包中实现,可在 Comprehensive R Archive Network 上获得:https://cran.r-project.org/package=updog。本文结果所需的所有代码都可在 GitHub 上获得:https://github.com/dcgerard/reproduce_prior_sims。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验