Pierotti Saul, Welz Bettina, Osuna-López Mireia, Fitzgerald Tomas, Wittbrodt Joachim, Birney Ewan
European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge CB101SD, United Kingdom.
Centre for Organismal Studies (COS), Heidelberg University, Heidelberg 69120, Germany.
Bioinform Adv. 2024 Jul 23;4(1):vbae107. doi: 10.1093/bioadv/vbae107. eCollection 2024.
Crosses among inbred lines are a fundamental tool for the discovery of genetic loci associated with phenotypes of interest. In organisms for which large reference panels or SNP chips are not available, imputation from low-pass whole-genome sequencing is an effective method for obtaining genotype data from a large number of individuals. To date, a structured analysis of the conditions required for optimal genotype imputation has not been performed.
We report a systematic exploration of the effect of several design variables on imputation performance in F2 crosses of inbred medaka lines using the imputation software STITCH. We determined that, depending on the number of samples, imputation performance reaches a plateau when increasing the per-sample sequencing coverage. We also systematically explored the trade-offs between cost, imputation accuracy, and sample numbers. We developed a computational pipeline to streamline the process, enabling other researchers to perform a similar cost-benefit analysis on their population of interest.
The source code for the pipeline is available at https://github.com/birneylab/stitchimpute. While our pipeline has been developed and tested for an F2 population, the software can also be used to analyse populations with a different structure.
近交系之间的杂交是发现与感兴趣的表型相关的遗传位点的基本工具。对于没有大型参考面板或SNP芯片的生物体,基于低通量全基因组测序的基因型填充是从大量个体中获取基因型数据的有效方法。迄今为止,尚未对最佳基因型填充所需的条件进行结构化分析。
我们使用填充软件STITCH,对几个设计变量对近交青鳉品系F2杂交中填充性能的影响进行了系统探索。我们确定,根据样本数量,增加每个样本的测序覆盖度时,填充性能会达到一个平台期。我们还系统地探讨了成本、填充准确性和样本数量之间的权衡。我们开发了一个计算流程来简化这一过程,使其他研究人员能够对他们感兴趣的群体进行类似的成本效益分析。
该流程的源代码可在https://github.com/birneylab/stitchimpute获取。虽然我们的流程是针对F2群体开发和测试的,但该软件也可用于分析具有不同结构的群体。