• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习用于非人灵长类动物基因组变异分析的优化模型。

Learning a refinement model for variant analysis in non-human primate genomes.

作者信息

Choi Jeonghoon, Zhou Bo, Song Giltae

机构信息

Division of Artificial Intelligence, School of Computer Science and Engineering, Pusan National University, Busan, South Korea.

Department of Biochemistry & Biophysics, Texas A&M University, College Station, TX, 77843, USA.

出版信息

BMC Genomics. 2025 Aug 25;26(1):775. doi: 10.1186/s12864-025-11921-2.

DOI:10.1186/s12864-025-11921-2
PMID:40855258
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12379468/
Abstract

BACKGROUND

Accurate variant calling is essential for genomic studies but is highly dependent on sequence alignment (SA) quality. In non-human primates, the lack of well-curated variant resources limits alignment postprocessing, leading to suboptimal SA and increased miscalls. DeepVariant, a leading variant caller, demonstrates high accuracy in human genomes but exhibits performance degradation under suboptimal SA conditions.

RESULTS

To address this, we developed a decision tree-based refinement model that integrates alignment quality metrics and DeepVariant confidence scores to filter miscalls effectively. We defined suboptimal SA and optimal SA based on the presence or absence of postprocessing steps and confirmed that suboptimal SA significantly increases miscalls in both human and rhesus macaque genomes. Applying the refinement model to human suboptimal SA reduced the miscalling ratio (MR) by 52.54%, demonstrating its effectiveness. When applied to rhesus macaque genomes, the model achieved a 76.20% MR reduction, showing its potential for non-human primate studies. Alternative base ratio (ABR) analysis further revealed that the model refines homozygous SNVs more effectively than heterozygous SNVs, improving variant classification reliability.

CONCLUSIONS

Our refinement model significantly improves variant calling in suboptimal SA conditions, which is particularly beneficial for non-human primate studies where alignment postprocessing is often limited. We packaged our model into the Genome Variant Refinement Pipeline (GVRP), providing for researchers working on population genetics and molecular evolution. This work establishes a framework for enhancing variant calling accuracy in species with limited genomic resources.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1186/s12864-025-11921-2.

摘要

背景

准确的变异位点检测对于基因组研究至关重要,但高度依赖于序列比对(SA)质量。在非人类灵长类动物中,缺乏精心整理的变异资源限制了比对后处理,导致次优的序列比对并增加了错误检测。DeepVariant是领先的变异位点检测工具,在人类基因组中显示出高准确性,但在次优的序列比对条件下表现会下降。

结果

为了解决这个问题,我们开发了一种基于决策树的优化模型,该模型整合了比对质量指标和DeepVariant置信度分数,以有效过滤错误检测。我们根据是否存在后处理步骤定义了次优序列比对和最优序列比对,并证实次优序列比对在人类和恒河猴基因组中均显著增加错误检测。将优化模型应用于人类次优序列比对时,错误检测率(MR)降低了52.54%,证明了其有效性。当应用于恒河猴基因组时,该模型实现了76.20%的错误检测率降低,显示出其在非人类灵长类动物研究中的潜力。替代碱基比率(ABR)分析进一步表明,该模型对纯合单核苷酸变异的优化比对杂合单核苷酸变异更有效,提高了变异分类的可靠性。

结论

我们的优化模型在次优序列比对条件下显著提高了变异位点检测,这对于比对后处理通常受限的非人类灵长类动物研究特别有益。我们将模型打包成基因组变异优化管道(GVRP),为从事群体遗传学和分子进化研究的人员提供便利。这项工作建立了一个在基因组资源有限的物种中提高变异位点检测准确性的框架。

补充信息

在线版本包含可在10.1186/s12864-025-11921-2获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/1f7f6b23b6c5/12864_2025_11921_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/8de0a699d42f/12864_2025_11921_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/06df0bc95ed4/12864_2025_11921_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/9b3de126144c/12864_2025_11921_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/4abfcb88060e/12864_2025_11921_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/7067449cd57f/12864_2025_11921_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/d686533a06bd/12864_2025_11921_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/1f7f6b23b6c5/12864_2025_11921_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/8de0a699d42f/12864_2025_11921_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/06df0bc95ed4/12864_2025_11921_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/9b3de126144c/12864_2025_11921_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/4abfcb88060e/12864_2025_11921_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/7067449cd57f/12864_2025_11921_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/d686533a06bd/12864_2025_11921_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dee/12379468/1f7f6b23b6c5/12864_2025_11921_Fig7_HTML.jpg

相似文献

1
Learning a refinement model for variant analysis in non-human primate genomes.学习用于非人灵长类动物基因组变异分析的优化模型。
BMC Genomics. 2025 Aug 25;26(1):775. doi: 10.1186/s12864-025-11921-2.
2
CYTO-SV-ML: A Machine Learning Tool for Cytogenetic Structural Variant Analysis in Somatic Cell Type Using Genome Sequences.CYTO-SV-ML:一种利用基因组序列对体细胞类型进行细胞遗传学结构变异分析的机器学习工具。
Life (Basel). 2025 Jun 9;15(6):929. doi: 10.3390/life15060929.
3
The impact of bioinformatic choices on variant identification accuracy.生物信息学选择对变异识别准确性的影响。
Microbiol Spectr. 2025 Aug 15:e0123225. doi: 10.1128/spectrum.01232-25.
4
Development of a Machine Learning Model for Aspyre Lung Blood: A New Assay for Rapid Detection of Actionable Variants From Plasma in Patients With Non-Small Cell Lung Cancer.用于Aspyre Lung Blood的机器学习模型的开发:一种用于快速检测非小细胞肺癌患者血浆中可操作变异的新检测方法。
JCO Clin Cancer Inform. 2025 Aug;9:e2500050. doi: 10.1200/CCI-25-00050. Epub 2025 Aug 15.
5
Early Detection of Novel SARS-CoV-2 Variants from Urban and Rural Wastewater through Genome Sequencing and Machine Learning.通过基因组测序和机器学习从城市和农村废水中早期检测新型SARS-CoV-2变体
medRxiv. 2024 Apr 19:2024.04.18.24306052. doi: 10.1101/2024.04.18.24306052.
6
Premolar Ecomorphology in Anthropoid Primates: A Machine Learning Approach.类人猿灵长类动物前磨牙的生态形态学:一种机器学习方法。
J Morphol. 2025 Aug;286(8):e70068. doi: 10.1002/jmor.70068.
7
Beyond the Mouse: The Mouse Lemur as a New Primate Model for Cardiovascular Research.超越小鼠:小鼠狐猴作为心血管研究的新型灵长类动物模型
Curr Cardiol Rep. 2025 Aug 13;27(1):123. doi: 10.1007/s11886-025-02276-x.
8
Equity in cancer genomics in the UK: a cross-sectional analysis of a national cancer cohort.英国癌症基因组学中的公平性:一项全国癌症队列的横断面分析。
Lancet Oncol. 2025 Jul;26(7):971-980. doi: 10.1016/S1470-2045(25)00199-8. Epub 2025 Jun 10.
9
SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.SAKit:一种用于鉴定由大尺度和小尺度变异事件产生的新型蛋白质的一体化分析管道。
J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.
10
Early detection of emerging SARS-CoV-2 Variants from wastewater through genome sequencing and machine learning.通过基因组测序和机器学习从废水中早期检测新出现的严重急性呼吸综合征冠状病毒2变体。
Nat Commun. 2025 Jul 8;16(1):6272. doi: 10.1038/s41467-025-61280-5.

本文引用的文献

1
Identification of novel single nucleotide variants in the drug resistance mechanism of Mycobacterium tuberculosis isolates by whole-genome analysis.通过全基因组分析鉴定结核分枝杆菌分离株耐药机制中的新型单核苷酸变异。
BMC Genomics. 2024 May 14;25(1):478. doi: 10.1186/s12864-024-10390-3.
2
RFcaller: a machine learning approach combined with read-level features to detect somatic mutations.RFcaller:一种结合读取水平特征以检测体细胞突变的机器学习方法。
NAR Genom Bioinform. 2023 May 30;5(2):lqad056. doi: 10.1093/nargab/lqad056. eCollection 2023 Jun.
3
Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment.
比较全基因组测序的调用管道:一项实证研究表明映射和比对的重要性。
Sci Rep. 2022 Dec 13;12(1):21502. doi: 10.1038/s41598-022-26181-3.
4
DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data.DeNovoCNN:一种用于下一代测序数据中从头变异调用的深度学习方法。
Nucleic Acids Res. 2022 Sep 23;50(17):e97. doi: 10.1093/nar/gkac511.
5
Comparison of GATK and DeepVariant by trio sequencing.基于 trio 测序的 GATK 和 DeepVariant 比较。
Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4.
6
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
7
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
8
Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility.序列多样性分析提高恒河猴基因组的生物医学应用价值。
Science. 2020 Dec 18;370(6523). doi: 10.1126/science.abc6617.
9
Accuracy and efficiency of germline variant calling pipelines for human genome data.人类基因组数据种系变异调用管道的准确性和效率。
Sci Rep. 2020 Nov 19;10(1):20222. doi: 10.1038/s41598-020-77218-4.
10
Significance of Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNAs.长链基因间非编码RNA中单核苷酸变异的意义
Front Cell Dev Biol. 2020 May 25;8:347. doi: 10.3389/fcell.2020.00347. eCollection 2020.