• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过将已知遗传变异纳入 minimap2 索引来提高全基因组测序数据中 SNV 的识别能力。

Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.

机构信息

Ivannikov Institute for System Programming, Moscow, Russia.

Institute for Information Transmission Problems, Moscow, Russia.

出版信息

BMC Bioinformatics. 2024 Jul 13;25(1):238. doi: 10.1186/s12859-024-05862-y.

DOI:10.1186/s12859-024-05862-y
PMID:39003441
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11246581/
Abstract

MOTIVATION

Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods.

RESULTS

In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.

摘要

动机

将读取序列与参考基因组序列对齐是通过下一代测序 (NGS) 技术获得的人类全基因组测序数据分析的关键步骤之一。分析的后续步骤的结果,例如遗传变异的临床解释结果或全基因组关联研究的结果,都取决于读取序列经过比对后其位置的正确识别。人类 NGS 全基因组测序数据的数量不断增加。全球有许多人类基因组测序项目,这些项目创建了大规模的人类基因组测序遗传变异数据库。当分析为新个体获得的测序数据时,例如通过创建基因组图谱,可以使用有关已知遗传变异的此类信息来提高读取序列比对阶段的质量。虽然现有的将读取序列与线性参考基因组对齐的方法具有较高的对齐速度,但在基因组的可变区域中,将读取序列与基因组图谱对齐的方法具有更高的准确性。开发一种考虑到线性参考序列索引中已知遗传变异的读取序列对齐方法,可以结合这两组方法的优势。

结果

在本文中,我们介绍了 minimap2_index_modifier 工具,该工具可以使用特定于给定人群的已知单核苷酸变异和插入/缺失 (indels) 构建参考基因组的修改索引。使用修改后的 minimap2 索引可以在不修改生物信息学管道且不会增加大量额外计算开销的情况下提高变异调用质量。使用 PrecisionFDA Truth Challenge V2 基准数据(针对 HG002 短读数据,使用参数 k=27 和 w=14 对齐到 GRCh38 线性参考 (GCA_000001405.15))进行的演示表明,当使用人类泛基因组参考联盟的遗传变异修改索引时,假阴性遗传变异的数量减少了 9500 多个,假阳性的数量减少了 7000 多个。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/a96d3a5ab1f8/12859_2024_5862_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/92f2a96332b5/12859_2024_5862_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/4d3c441e7d29/12859_2024_5862_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/1ec9dd11d75a/12859_2024_5862_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/b7e92acfa9a6/12859_2024_5862_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/4f5ff5ceec4a/12859_2024_5862_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/818e1c5f058e/12859_2024_5862_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/9fb3efe2acc4/12859_2024_5862_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/bd2102d6c7a7/12859_2024_5862_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/a96d3a5ab1f8/12859_2024_5862_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/92f2a96332b5/12859_2024_5862_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/4d3c441e7d29/12859_2024_5862_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/1ec9dd11d75a/12859_2024_5862_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/b7e92acfa9a6/12859_2024_5862_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/4f5ff5ceec4a/12859_2024_5862_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/818e1c5f058e/12859_2024_5862_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/9fb3efe2acc4/12859_2024_5862_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/bd2102d6c7a7/12859_2024_5862_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de2e/11246581/a96d3a5ab1f8/12859_2024_5862_Fig9_HTML.jpg

相似文献

1
Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.通过将已知遗传变异纳入 minimap2 索引来提高全基因组测序数据中 SNV 的识别能力。
BMC Bioinformatics. 2024 Jul 13;25(1):238. doi: 10.1186/s12859-024-05862-y.
2
Calling known variants and identifying new variants while rapidly aligning sequence data.在快速对齐序列数据的同时,调用已知变异体并识别新变异体。
J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.
3
Fast and SNP-aware short read alignment with SALT.基于 SALT 的快速 SNP 感知短读序列比对。
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.
4
Fast and memory efficient approach for mapping NGS reads to a reference genome.将二代测序(NGS) reads 映射到参考基因组的快速且内存高效的方法。
J Bioinform Comput Biol. 2019 Apr;17(2):1950008. doi: 10.1142/S0219720019500082.
5
Fast read alignment with incorporation of known genomic variants.快速读取与已知基因组变异的整合。
BMC Med Inform Decis Mak. 2019 Dec 19;19(Suppl 6):265. doi: 10.1186/s12911-019-0960-3.
6
Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications.利用直系同源序列变异进行敏感比对可提高大片段重复区域的长读长序列比对和变异calling 效率。
Nucleic Acids Res. 2020 Nov 4;48(19):e114. doi: 10.1093/nar/gkaa829.
7
Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.Longshot 可通过单分子长读测序对二倍体基因组进行准确的变异调用。
Nat Commun. 2019 Oct 11;10(1):4660. doi: 10.1038/s41467-019-12493-y.
8
Ψ-RA: a parallel sparse index for genomic read alignment.Ψ-RA:一种用于基因组读取比对的并行稀疏索引。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27.
9
Fast and accurate genomic analyses using genome graphs.利用基因组图谱进行快速准确的基因组分析。
Nat Genet. 2019 Feb;51(2):354-362. doi: 10.1038/s41588-018-0316-4. Epub 2019 Jan 14.
10
Bioinformatics Basics for High-Throughput Hybridization-Based Targeted DNA Sequencing from FFPE-Derived Tumor Specimens: From Reads to Variants.基于FFPE来源肿瘤标本的高通量杂交靶向DNA测序的生物信息学基础:从 reads 到变异体
Methods Mol Biol. 2019;1908:37-48. doi: 10.1007/978-1-4939-9004-7_3.

引用本文的文献

1
Correction: Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index.更正:通过将已知遗传变异纳入minimap2索引来增强全基因组测序数据中的单核苷酸变异识别。
BMC Bioinformatics. 2024 Aug 19;25(1):268. doi: 10.1186/s12859-024-05892-6.

本文引用的文献

1
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
2
Transferability of the PRS estimates for height and BMI obtained from the European ethnic groups to the Western Russian populations.从欧洲族群获得的身高和体重指数的多基因风险评分(PRS)估计值对俄罗斯西部人群的可转移性。
Front Genet. 2023 Jan 16;14:1086709. doi: 10.3389/fgene.2023.1086709. eCollection 2023.
3
Strobealign: flexible seed size enables ultra-fast and accurate read alignment.
Strobealign:灵活的种子大小可实现超快速和准确的读取对齐。
Genome Biol. 2022 Dec 15;23(1):260. doi: 10.1186/s13059-022-02831-7.
4
PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions.精准FDA真相挑战V2:在难以映射的区域中从短读长和长读长中识别变异体。
Cell Genom. 2022 May 11;2(5). doi: 10.1016/j.xgen.2022.100129. Epub 2022 Apr 27.
5
New strategies to improve minimap2 alignment accuracy.提高 minimap2 比对准确性的新策略。
Bioinformatics. 2021 Dec 7;37(23):4572-4574. doi: 10.1093/bioinformatics/btab705.
6
Fast and memory efficient approach for mapping NGS reads to a reference genome.将二代测序(NGS) reads 映射到参考基因组的快速且内存高效的方法。
J Bioinform Comput Biol. 2019 Apr;17(2):1950008. doi: 10.1142/S0219720019500082.
7
Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。
Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.
8
Fast construction of FM-index for long sequence reads.快速构建长序列读取的 FM-index。
Bioinformatics. 2014 Nov 15;30(22):3274-5. doi: 10.1093/bioinformatics/btu541. Epub 2014 Aug 8.
9
Informed and automated k-mer size selection for genome assembly.基于信息和自动化的基因组组装的 k-mer 大小选择。
Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.
10
Comparison of next-generation sequencing systems.新一代测序系统的比较。
J Biomed Biotechnol. 2012;2012:251364. doi: 10.1155/2012/251364. Epub 2012 Jul 5.