Suppr超能文献

ngsJulia:使用 Julia 语言进行下一代 DNA 测序数据的群体遗传分析。

ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language.

机构信息

Department of Life Sciences, Imperial College London, London, UK.

Institute of population genetics, University of Veterinary Medicine Vienna, Vienna, Austria.

出版信息

F1000Res. 2023 Jul 14;11:126. doi: 10.12688/f1000research.104368.2. eCollection 2022.

Abstract

A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Methods and implementations to estimate population genetic parameters from sequencing data available nowadays either are suitable for the analysis of genomes from model organisms only, require moderate sequencing coverage, or are not easily adaptable to specific applications. To address these issues, we introduce ngsJulia, a collection of templates and functions in Julia language to process short-read sequencing data for population genetic analysis. We further describe two implementations, ngsPool and ngsPloidy, for the analysis of pooled sequencing data and polyploid genomes, respectively. Through simulations, we illustrate the performance of estimating various population genetic parameters using these implementations, using both established and novel statistical methods. These results inform on optimal experimental design and demonstrate the applicability of methods in ngsJulia to estimate parameters of interest even from low coverage sequencing data. ngsJulia provide users with a flexible and efficient framework for ad hoc analysis of sequencing data.ngsJulia is available from: https://github.com/mfumagalli/ngsJulia.

摘要

对 DNA 测序数据进行准确的分析对于提取有意义的信息和推断感兴趣的数量非常重要。测序和映射错误,以及低覆盖度和可变覆盖度,都妨碍了基因型和变体的识别,以及群体遗传参数的估计。目前可用于从测序数据中估计群体遗传参数的方法和实现,要么仅适用于模式生物基因组的分析,要么需要中等测序覆盖度,要么不容易适应特定应用。为了解决这些问题,我们引入了 ngsJulia,这是一个用 Julia 语言编写的模板和函数集合,用于处理短读测序数据进行群体遗传分析。我们进一步描述了两种实现,ngsPool 和 ngsPloidy,分别用于分析混合测序数据和多倍体基因组。通过模拟,我们使用这些实现,使用已建立和新的统计方法,说明了估计各种群体遗传参数的性能。这些结果为最佳实验设计提供了信息,并证明了 ngsJulia 中的方法即使从低覆盖度测序数据中也能估计感兴趣的参数的适用性。ngsJulia 为用户提供了一个灵活高效的框架,用于对测序数据进行特定分析。ngsJulia 可在以下网址获得:https://github.com/mfumagalli/ngsJulia。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4025/10569264/590511953d1e/f1000research-11-150492-g0000.jpg

相似文献

1
ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language.
F1000Res. 2023 Jul 14;11:126. doi: 10.12688/f1000research.104368.2. eCollection 2022.
2
SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.
J Bioinform Comput Biol. 2024 Oct;22(5):2450022. doi: 10.1142/S0219720024500227. Epub 2024 Oct 1.
3
Cloud-based introduction to BASH programming for biologists.
Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae244.
6
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
8
Evaluation of sequencing reads at scale using rdeval.
Bioinformatics. 2025 Jul 22. doi: 10.1093/bioinformatics/btaf416.

本文引用的文献

1
A beginner's guide to low-coverage whole genome sequencing for population genomics.
Mol Ecol. 2021 Dec;30(23):5966-5993. doi: 10.1111/mec.16077. Epub 2021 Aug 31.
2
Demographic inference.
Curr Biol. 2021 Mar 22;31(6):R276-R279. doi: 10.1016/j.cub.2021.01.053.
3
Long-read human genome sequencing and its applications.
Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.
4
CellFishing.jl: an ultrafast and scalable cell search method for single-cell RNA sequencing.
Genome Biol. 2019 Feb 11;20(1):31. doi: 10.1186/s13059-019-1639-x.
5
Genome doubling shapes the evolution and prognosis of advanced cancers.
Nat Genet. 2018 Aug;50(8):1189-1195. doi: 10.1038/s41588-018-0165-1. Epub 2018 Jul 16.
6
Genomic epidemiology of the UK outbreak of the emerging human fungal pathogen Candida auris.
Emerg Microbes Infect. 2018 Mar 29;7(1):43. doi: 10.1038/s41426-018-0045-x.
7
Candida auris: a worrisome, globally emerging pathogen.
Expert Rev Anti Infect Ther. 2017 Sep;15(9):819-827. doi: 10.1080/14787210.2017.1364992. Epub 2017 Aug 14.
8
Advancements in Next-Generation Sequencing.
Annu Rev Genomics Hum Genet. 2016 Aug 31;17:95-115. doi: 10.1146/annurev-genom-083115-022413. Epub 2016 Jun 9.
9
Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.
PLoS One. 2015 Oct 13;10(10):e0140462. doi: 10.1371/journal.pone.0140462. eCollection 2015.
10
From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.
Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验