Suppr超能文献

学习用于非人灵长类动物基因组变异分析的优化模型。

Learning a refinement model for variant analysis in non-human primate genomes.

作者信息

Choi Jeonghoon, Zhou Bo, Song Giltae

机构信息

Division of Artificial Intelligence, School of Computer Science and Engineering, Pusan National University, Busan, South Korea.

Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX, 77843, USA.

出版信息

BMC Genomics. 2025 Aug 25;26(1):775. doi: 10.1186/s12864-025-11921-2.

Abstract

BACKGROUND

Accurate variant calling is essential for genomic studies but is highly dependent on sequence alignment (SA) quality. In non-human primates, the lack of well-curated variant resources limits alignment postprocessing, leading to suboptimal SA and increased miscalls. DeepVariant, a leading variant caller, demonstrates high accuracy in human genomes but exhibits performance degradation under suboptimal SA conditions.

RESULTS

To address this, we developed a decision tree-based refinement model that integrates alignment quality metrics and DeepVariant confidence scores to filter miscalls effectively. We defined suboptimal SA and optimal SA based on the presence or absence of postprocessing steps and confirmed that suboptimal SA significantly increases miscalls in both human and rhesus macaque genomes. Applying the refinement model to human suboptimal SA reduced the miscalling ratio (MR) by 52.54%, demonstrating its effectiveness. When applied to rhesus macaque genomes, the model achieved a 76.20% MR reduction, showing its potential for non-human primate studies. Alternative base ratio (ABR) analysis further revealed that the model refines homozygous SNVs more effectively than heterozygous SNVs, improving variant classification reliability.

CONCLUSIONS

Our refinement model significantly improves variant calling in suboptimal SA conditions, which is particularly beneficial for non-human primate studies where alignment postprocessing is often limited. We packaged our model into the Genome Variant Refinement Pipeline (GVRP), providing for researchers working on population genetics and molecular evolution. This work establishes a framework for enhancing variant calling accuracy in species with limited genomic resources.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1186/s12864-025-11921-2.

摘要

背景

准确的变异位点检测对于基因组研究至关重要,但高度依赖于序列比对(SA)质量。在非人类灵长类动物中,缺乏精心整理的变异资源限制了比对后处理,导致次优的序列比对并增加了错误检测。DeepVariant是领先的变异位点检测工具,在人类基因组中显示出高准确性,但在次优的序列比对条件下表现会下降。

结果

为了解决这个问题,我们开发了一种基于决策树的优化模型,该模型整合了比对质量指标和DeepVariant置信度分数,以有效过滤错误检测。我们根据是否存在后处理步骤定义了次优序列比对和最优序列比对,并证实次优序列比对在人类和恒河猴基因组中均显著增加错误检测。将优化模型应用于人类次优序列比对时,错误检测率(MR)降低了52.54%,证明了其有效性。当应用于恒河猴基因组时,该模型实现了76.20%的错误检测率降低,显示出其在非人类灵长类动物研究中的潜力。替代碱基比率(ABR)分析进一步表明,该模型对纯合单核苷酸变异的优化比对杂合单核苷酸变异更有效,提高了变异分类的可靠性。

结论

我们的优化模型在次优序列比对条件下显著提高了变异位点检测,这对于比对后处理通常受限的非人类灵长类动物研究特别有益。我们将模型打包成基因组变异优化管道(GVRP),为从事群体遗传学和分子进化研究的人员提供便利。这项工作建立了一个在基因组资源有限的物种中提高变异位点检测准确性的框架。

补充信息

在线版本包含可在10.1186/s12864-025-11921-2获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/8de0a699d42f/12864_2025_11921_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验