Suppr超能文献

茄子中用于精准基因分型的低覆盖度测序工作流程的基准测试

Benchmarking of low coverage sequencing workflows for precision genotyping in eggplant.

作者信息

Baraja-Fonseca Virginia, Arrones Andrea, Vilanova Santiago, Plazas Mariola, Prohens Jaime, Bombarely Aureliano, Gramazio Pietro

机构信息

Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, Valencia, 46022, Spain.

Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas - Universitat Politècnica de València, Camino de Vera 14, Valencia, 46022, Spain.

出版信息

BMC Plant Biol. 2025 Aug 25;25(1):1125. doi: 10.1186/s12870-025-07242-x.

Abstract

BACKGROUND

Low-coverage whole-genome sequencing (lcWGS) presents a cost-effective solution for genotyping, particularly in applications requiring high marker density and reduced costs. In this study, we evaluated lcWGS for eggplant genotyping using eight founder accessions from the first eggplant MAGIC population (MEGGIC). We tested various sequencing coverages and minimum depth of coverage thresholds with two SNP callers, Freebayes and GATK. Reference SNP panels were used to estimate the percentage of common biallelic SNPs (i.e., true positives) relative to the low coverage datasets (accuracy) and the SNP panels themselves (sensitivity). Furthermore, the percentage of true positives with the same genotype across both datasets was calculated to assess genotypic concordance.

RESULTS

Sequencing coverages as low as 1X and 2X achieved high accuracy but lacked sufficient sensitivity and genotypic concordance. However, 3X sequencing reached approximately 10% less sensitivity than 5X while maintaining genotypic concordance above 90% at any depth of coverage threshold. Freebayes outperformed GATK in terms of sensitivity and genotypic concordance. Therefore, we used this software to conduct a pilot test with some MEGGIC lines from the fifth generation of selfing, comparing their datasets with a gold standard. Sequencing coverages as low as 1X identified a substantial number of true positives, with 3X significantly increasing the yield, particularly at moderate depth of coverage thresholds. Additionally, at least 30% of the true positives were consistently genotyped in all lines when using coverages greater than 2X, regardless of the depth of coverage threshold applied.

CONCLUSIONS

This study highlights the importance of using a gold standard to reduce false positives and demonstrates that lcWGS, with proper filtering, is a valuable alternative to high-coverage sequencing for eggplant genotyping, with potential applications to other crops.

摘要

背景

低覆盖度全基因组测序(lcWGS)为基因分型提供了一种经济高效的解决方案,尤其适用于需要高标记密度和降低成本的应用。在本研究中,我们使用来自首个茄子多亲本高级世代互交群体(MEGGIC)的八个奠基材料评估了lcWGS用于茄子基因分型的效果。我们使用两种单核苷酸多态性(SNP) calling软件Freebayes和基因组分析工具包(GATK)测试了各种测序覆盖度和最低覆盖深度阈值。参考SNP面板用于估计相对于低覆盖度数据集(准确性)和SNP面板本身(敏感性)的常见双等位基因SNP(即真阳性)的百分比。此外,计算两个数据集中具有相同基因型的真阳性百分比以评估基因型一致性。

结果

低至1X和2X的测序覆盖度具有较高的准确性,但缺乏足够的敏感性和基因型一致性。然而,3X测序的敏感性比5X低约10%,同时在任何覆盖深度阈值下保持基因型一致性高于90%。在敏感性和基因型一致性方面,Freebayes优于GATK。因此,我们使用该软件对来自自交第五代的一些MEGGIC株系进行了初步测试,并将它们的数据集与金标准进行比较。低至1X的测序覆盖度鉴定出大量真阳性,3X显著提高了产量,尤其是在中等覆盖深度阈值时。此外,当使用大于2X的覆盖度时,无论应用的覆盖深度阈值如何,至少30%的真阳性在所有株系中都能得到一致的基因分型。

结论

本研究强调了使用金标准减少假阳性的重要性,并表明经过适当筛选的lcWGS是茄子基因分型中高覆盖度测序的一种有价值的替代方法,具有应用于其他作物的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/704e/12379343/99db9ba2cb2c/12870_2025_7242_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验