使用k-mer从测序读数中进行无参考关联映射。

Reference-free Association Mapping from Sequencing Reads Using k-mers.

作者信息

Mehrab Zakaria, Mobin Jaiaid, Tahmid Ibrahim Asadullah, Pachter Lior, Rahman Atif

机构信息

Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.

Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.

出版信息

Bio Protoc. 2020 Nov 5;10(21):e3815. doi: 10.21769/BioProtoc.3815.

DOI:10.21769/BioProtoc.3815

PMID:33659468

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7842384/

Abstract

Association mapping is the process of linking phenotypes with genotypes. In genome wide association studies (GWAS), individuals are first genotyped using microarrays or by aligning sequenced reads to reference genomes. However, both these approaches rely on reference genomes which limits their application to organisms with no or incomplete reference genomes. To address this, reference free association mapping methods have been developed. Here we present the protocol of an alignment free method for association studies which is based on counting k-mers in sequenced reads, testing for associations between k-mers and the phenotype of interest, and local assembly of the k-mers of statistical significance. The method can map associations of categorical phenotypes to sequence and structural variations without requiring prior sequencing of reference genomes.

摘要

关联作图是将表型与基因型联系起来的过程。在全基因组关联研究（GWAS）中，首先使用微阵列或通过将测序读数与参考基因组比对来对个体进行基因分型。然而，这两种方法都依赖于参考基因组，这限制了它们在没有参考基因组或参考基因组不完整的生物体中的应用。为了解决这个问题，已经开发了无参考关联作图方法。在这里，我们介绍一种用于关联研究的无比对方法的方案，该方法基于对测序读数中的k-mer进行计数，测试k-mer与感兴趣的表型之间的关联，以及对具有统计学意义的k-mer进行局部组装。该方法可以将分类表型的关联映射到序列和结构变异，而无需事先对参考基因组进行测序。

相似文献

Reference-free Association Mapping from Sequencing Reads Using k-mers.使用k-mer从测序读数中进行无参考关联映射。

Bio Protoc. 2020 Nov 5;10(21):e3815. doi: 10.21769/BioProtoc.3815.

Association mapping from sequencing reads using -mers.基于 -mers 的测序reads 的关联作图。

Elife. 2018 Jun 13;7:e32920. doi: 10.7554/eLife.32920.

Efficient association mapping from k-mers-An application in finding sex-specific sequences.从 k- -mer 高效关联映射-在寻找性别特异性序列中的应用。

PLoS One. 2021 Jan 7;16(1):e0245058. doi: 10.1371/journal.pone.0245058. eCollection 2021.

Kmer2SNP: Reference-Free Heterozygous SNP Calling Using k-mer Frequency Distributions.Kmer2SNP：基于 k-mer 频率分布的无参考杂合 SNP 调用。

Methods Mol Biol. 2022;2493:257-265. doi: 10.1007/978-1-0716-2293-3_16.

Fast and Accurate Algorithms for Mapping and Aligning Long Reads.快速准确的长读映射和对齐算法。

J Comput Biol. 2021 Aug;28(8):789-803. doi: 10.1089/cmb.2020.0603. Epub 2021 Jun 23.

-mer-Based Genome-Wide Association Studies in Plants: Advances, Challenges, and Perspectives.基于代谢组的植物全基因组关联研究：进展、挑战与展望。

Genes (Basel). 2023 Jul 13;14(7):1439. doi: 10.3390/genes14071439.

k-mer-based approaches to bridging pangenomics and population genetics.基于k-mer的方法在泛基因组学和群体遗传学之间架起桥梁。

ArXiv. 2024 Sep 18:arXiv:2409.11683v1.

Identifying genetic variants underlying phenotypic variation in plants without complete genomes.鉴定没有完整基因组的植物表型变异的遗传变异。

Nat Genet. 2020 May;52(5):534-540. doi: 10.1038/s41588-020-0612-7. Epub 2020 Apr 13.

SAKE: Strobemer-assisted k-mer extraction.SAKE：频闪辅助 k-mer 提取。

PLoS One. 2023 Nov 29;18(11):e0294415. doi: 10.1371/journal.pone.0294415. eCollection 2023.

RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.RepAHR：通过组装高频读段进行从头鉴定重复序列的改进方法。

BMC Bioinformatics. 2020 Oct 19;21(1):463. doi: 10.1186/s12859-020-03779-w.

引用本文的文献

Use of whole genome sequencing for surveillance and control of foodborne diseases: and .使用全基因组测序进行食源性疾病的监测与控制：以及。（你提供的原文似乎不完整，翻译可能会受影响，你可检查并补充完整内容以便更准确翻译。）

Front Microbiol. 2024 Sep 13;15:1460335. doi: 10.3389/fmicb.2024.1460335. eCollection 2024.

The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms.第三届国际黑客马拉松，旨在将大规模基因组构成的见解应用于广泛生物的用例中。

F1000Res. 2022 May 16;11:530. doi: 10.12688/f1000research.110194.1. eCollection 2022.

本文引用的文献

Identifying genetic variants underlying phenotypic variation in plants without complete genomes.鉴定没有完整基因组的植物表型变异的遗传变异。

Nat Genet. 2020 May;52(5):534-540. doi: 10.1038/s41588-020-0612-7. Epub 2020 Apr 13.

A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events.一种快速且无偏倚的细菌全基因组关联研究方法：弥合 k- mers 与遗传事件之间的差距。

PLoS Genet. 2018 Nov 12;14(11):e1007758. doi: 10.1371/journal.pgen.1007758. eCollection 2018 Nov.

Association mapping from sequencing reads using -mers.基于 -mers 的测序reads 的关联作图。

Elife. 2018 Jun 13;7:e32920. doi: 10.7554/eLife.32920.

Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes.序列元件富集分析确定细菌表型的遗传基础。

Nat Commun. 2016 Sep 16;7:12797. doi: 10.1038/ncomms12797.

Identifying lineage effects when controlling for population structure improves power in bacterial association studies.在控制群体结构时识别谱系效应可提高细菌关联研究的效能。

Nat Microbiol. 2016 Apr 4;1:16041. doi: 10.1038/nmicrobiol.2016.41.

Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter.全基因组关联研究鉴定出维生素 B5 生物合成是弯曲菌属宿主特异性的一个因素。

Proc Natl Acad Sci U S A. 2013 Jul 16;110(29):11923-7. doi: 10.1073/pnas.1305559110. Epub 2013 Jul 1.

Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。

Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.一种快速、无锁的方法，用于高效并行计数 k-mer 的出现次数。

Bioinformatics. 2011 Mar 15;27(6):764-70. doi: 10.1093/bioinformatics/btr011. Epub 2011 Jan 7.

ABySS: a parallel assembler for short read sequence data.ABySS：一种用于短读长序列数据的并行汇编器。

Genome Res. 2009 Jun;19(6):1117-23. doi: 10.1101/gr.089532.108. Epub 2009 Feb 27.

Population structure and eigenanalysis.群体结构与特征分析

PLoS Genet. 2006 Dec;2(12):e190. doi: 10.1371/journal.pgen.0020190.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。