参考文献流向：利用多个群体基因组减少参考文献偏差。

Reference flow: reducing reference bias using multiple population genomes.

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, USA.

出版信息

Genome Biol. 2021 Jan 4;22(1):8. doi: 10.1186/s13059-020-02229-3.

DOI:10.1186/s13059-020-02229-3

PMID:33397413

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7780692/

Abstract

Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.

摘要

大多数测序数据分析都是从将测序reads 与线性参考基因组比对开始的，但如果没有考虑遗传变异，就会导致参考偏差和下游结果的混淆。其他方法则用图等结构替代线性参考，这些结构可以包括遗传变异，但会带来巨大的计算开销。我们提出了参考流比对方法，该方法使用多个群体参考基因组来提高比对准确性并减少参考偏差。与图比对工具 vg 相比，参考流实现了相似的准确性和偏差避免水平，但内存占用仅为其 14%，速度则快了 5.5 倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/03f7/7780692/1050f4599421/13059_2020_2229_Fig1_HTML.jpg

相似文献

Reference flow: reducing reference bias using multiple population genomes.

Genome Biol. 2021 Jan 4;22(1):8. doi: 10.1186/s13059-020-02229-3.

Fast and SNP-aware short read alignment with SALT.

BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.

Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph.

Genome Biol. 2020 Sep 17;21(1):250. doi: 10.1186/s13059-020-02160-7.

DNA sequences alignment method using sparse index on pan-genome graph.

J Bioinform Comput Biol. 2024 Aug;22(4):2450019. doi: 10.1142/S0219720024500197. Epub 2024 Aug 31.

Fast alignment of reads to a variation graph with application to SNP detection.

J Integr Bioinform. 2021 Nov 16;18(4):20210032. doi: 10.1515/jib-2021-0032.

Meta-aligner: long-read alignment based on genome statistics.

BMC Bioinformatics. 2017 Feb 23;18(1):126. doi: 10.1186/s12859-017-1518-y.

Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

Microb Genom. 2017 Jul 8;3(9):e000122. doi: 10.1099/mgen.0.000122. eCollection 2017 Sep.

Fast and accurate genomic analyses using genome graphs.

Nat Genet. 2019 Feb;51(2):354-362. doi: 10.1038/s41588-018-0316-4. Epub 2019 Jan 14.

Variation graph toolkit improves read mapping by representing genetic variation in the reference.

Nat Biotechnol. 2018 Oct;36(9):875-879. doi: 10.1038/nbt.4227. Epub 2018 Aug 20.

Calling known variants and identifying new variants while rapidly aligning sequence data.

J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.

引用本文的文献

Genetic Signatures of Competitive Performance in Burmese Gamecocks: A Transcriptomic Analysis.

Biology (Basel). 2025 Aug 16;14(8):1066. doi: 10.3390/biology14081066.

A practical guide to identifying associations between tandem repeats and complex human traits using consensus genotypes from multiple tools.

Nat Protoc. 2025 Sep 1. doi: 10.1038/s41596-025-01231-y.

Long-Read Sequencing and Structural Variant Detection: Unlocking the Hidden Genome in Rare Genetic Disorders.

Diagnostics (Basel). 2025 Jul 17;15(14):1803. doi: 10.3390/diagnostics15141803.

Exploiting uniqueness: seed-chain-extend alignment on elastic founder graphs.

Bioinformatics. 2025 Jul 1;41(Supplement_1):i265-i274. doi: 10.1093/bioinformatics/btaf225.

A survey of sequence-to-graph mapping algorithms in the pangenome era.

Genome Biol. 2025 May 22;26(1):138. doi: 10.1186/s13059-025-03606-6.

The impact of ancestral, genetic, and environmental influences on germline de novo mutation rates and spectra.

Nat Commun. 2025 May 15;16(1):4527. doi: 10.1038/s41467-025-59750-x.

Methodological opportunities in genomic data analysis to advance health equity.

Nat Rev Genet. 2025 May 15. doi: 10.1038/s41576-025-00839-w.

Pangenome graph mitigates heterozygosity overestimation from mapping bias: a case study in Chinese indigenous pigs.

BMC Biol. 2025 Mar 26;23(1):89. doi: 10.1186/s12915-025-02194-y.

K-mer-based Approaches to Bridging Pangenomics and Population Genetics.

Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf047.

SVLearn: a dual-reference machine learning approach enables accurate cross-species genotyping of structural variants.

Nat Commun. 2025 Mar 11;16(1):2406. doi: 10.1038/s41467-025-57756-z.

本文引用的文献

The design and construction of reference pangenome graphs with minigraph.

Genome Biol. 2020 Oct 16;21(1):265. doi: 10.1186/s13059-020-02168-z.

GraphAligner: rapid and versatile sequence-to-graph alignment.

Genome Biol. 2020 Sep 24;21(1):253. doi: 10.1186/s13059-020-02157-2.

Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph.

Genome Biol. 2020 Sep 17;21(1):250. doi: 10.1186/s13059-020-02160-7.

Convolutional Embedded Networks for Population Scale Clustering and Bio-Ancestry Inferencing.

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):369-382. doi: 10.1109/TCBB.2020.2994649. Epub 2022 Feb 3.

Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery.

Genome Biol. 2020 Jul 27;21(1):184. doi: 10.1186/s13059-020-02105-0.

The mutational constraint spectrum quantified from variation in 141,456 humans.

Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.

Personalized and graph genomes reveal missing signal in epigenomic data.

Genome Biol. 2020 May 25;21(1):124. doi: 10.1186/s13059-020-02038-8.

Assessing graph-based read mappers against a baseline approach highlights strengths and weaknesses of current methods.

BMC Genomics. 2020 Apr 6;21(1):282. doi: 10.1186/s12864-020-6685-y.

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.

J Comput Biol. 2020 Apr;27(4):500-513. doi: 10.1089/cmb.2019.0309. Epub 2020 Mar 16.

Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project.

Wellcome Open Res. 2019 Dec 30;4:50. doi: 10.12688/wellcomeopenres.15126.2. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

参考文献流向：利用多个群体基因组减少参考文献偏差。

Reference flow: reducing reference bias using multiple population genomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献