ngsTools：从下一代测序数据中进行群体遗传学分析的方法。

ngsTools: methods for population genetics analyses from next-generation sequencing data.

机构信息

Department of Integrative Biology, Department of Statistics, University of California, Berkeley, CA 94720, USA and Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark.

出版信息

Bioinformatics. 2014 May 15;30(10):1486-7. doi: 10.1093/bioinformatics/btu041. Epub 2014 Jan 23.

DOI:10.1093/bioinformatics/btu041

PMID:24458950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4016704/

Abstract

SUMMARY

Next-generation sequencing technologies produce short reads that are either de novo assembled or mapped to a reference genome. Genotypes and/or single-nucleotide polymorphisms are then determined from the read composition at each site, which become the basis for many downstream analyses. However, for low sequencing depths, e.g. , there is considerable statistical uncertainty in the assignment of genotypes because of random sampling of homologous base pairs in heterozygotes and sequencing or alignment errors. Recently, several probabilistic methods have been proposed to account for this uncertainty and make accurate inferences from low quality and/or coverage sequencing data. We present ngsTools, a collection of programs to perform population genetics analyses from next-generation sequencing data. The methods implemented in these programs do not rely on single-nucleotide polymorphism or genotype calling and are particularly suitable for low sequencing depth data.

AVAILABILITY

Programs included in ngsTools are implemented in C/C++ and are freely available for noncommercial use at https://github.com/mfumagalli/ngsTools.

CONTACT

mfumagalli82@gmail.com

SUPPLEMENTARY INFORMATION

Supplementary materials are available at Bioinformatics online.

摘要

新一代测序技术会产生短读段，这些短读段可以从头组装，也可以映射到参考基因组上。然后，根据每个位置的读取组成来确定基因型和/或单核苷酸多态性，这些成为许多下游分析的基础。然而，对于低测序深度，例如，由于杂合子中同源碱基对的随机抽样以及测序或比对错误，基因型的分配存在相当大的统计不确定性。最近，已经提出了几种概率方法来考虑这种不确定性，并从低质量和/或覆盖测序数据中进行准确推断。我们提出了 ngsTools，这是一组用于从下一代测序数据进行群体遗传学分析的程序。这些程序中实现的方法不依赖于单核苷酸多态性或基因型调用，特别适用于低测序深度数据。

可用性

ngsTools 中包含的程序是用 C/C++编写的，可在 https://github.com/mfumagalli/ngsTools 上免费用于非商业用途。

联系人

mfumagalli82@gmail.com

补充信息

补充材料可在 Bioinformatics 在线获得。

相似文献

ngsTools: methods for population genetics analyses from next-generation sequencing data.ngsTools：从下一代测序数据中进行群体遗传学分析的方法。

Bioinformatics. 2014 May 15;30(10):1486-7. doi: 10.1093/bioinformatics/btu041. Epub 2014 Jan 23.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.利用跨越多个单核苷酸多态性的读取信息，从测序数据中推断单倍型。

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

LocalNgsRelate: a software tool for inferring IBD sharing along the genome between pairs of individuals from low-depth NGS data.LocalNgsRelate：一种用于从低深度NGS数据推断个体对之间全基因组IBD共享情况的软件工具。

Bioinformatics. 2022 Jan 27;38(4):1159-1161. doi: 10.1093/bioinformatics/btab732.

One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies.一刀切并不适用——RefEditor：构建个性化二倍体参考基因组以改善下一代测序研究中的读段映射和基因型调用

PLoS Comput Biol. 2015 Aug 12;11(8):e1004448. doi: 10.1371/journal.pcbi.1004448. eCollection 2015 Aug.

Error filtering, pair assembly and error correction for next-generation sequencing reads.下一代测序reads 的错误过滤、配对组装和纠错。

Bioinformatics. 2015 Nov 1;31(21):3476-82. doi: 10.1093/bioinformatics/btv401. Epub 2015 Jul 2.

Review of alignment and SNP calling algorithms for next-generation sequencing data.下一代测序数据的比对和单核苷酸多态性（SNP）检测算法综述。

J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.

Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing.有系统测序差异的下一代测序数据中群体结构的稳健推断

Bioinformatics. 2018 Apr 1;34(7):1157-1163. doi: 10.1093/bioinformatics/btx708.

polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids.polyRAD：多倍体和二倍体测序数据不确定性下的基因型分型

G3 (Bethesda). 2019 Mar 7;9(3):663-673. doi: 10.1534/g3.118.200913.

NGSremix: a software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data.NGSremix：一种用于从下一代测序数据估算混合个体之间成对亲缘关系的软件工具。

G3 (Bethesda). 2021 Aug 7;11(8). doi: 10.1093/g3journal/jkab174.

RepLong: de novo repeat identification using long read sequencing data.RepLong：利用长读测序数据进行从头重复识别。

Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717.

引用本文的文献

Biocultural vulnerability of traditional crops in the Indian Trans-Himalaya.印度跨喜马拉雅地区传统作物的生物文化脆弱性。

Sci Adv. 2025 Aug 15;11(33):eadu6611. doi: 10.1126/sciadv.adu6611.

Efficient Detection and Characterization of Targets of Natural Selection Using Transfer Learning.利用迁移学习对自然选择目标进行高效检测与特征描述

Mol Biol Evol. 2025 Apr 30;42(5). doi: 10.1093/molbev/msaf094.

Fine Scale Patterns of Population Structure and Connectivity in Scandinavian Flat Oysters in Scandinavia ( L.).斯堪的纳维亚半岛平牡蛎（L.）种群结构和连通性的精细尺度模式

Evol Appl. 2025 Mar 31;18(4):e70096. doi: 10.1111/eva.70096. eCollection 2025 Apr.

Efficient detection and characterization of targets of natural selection using transfer learning.利用迁移学习对自然选择目标进行高效检测与特征描述。

bioRxiv. 2025 Mar 6:2025.03.05.641710. doi: 10.1101/2025.03.05.641710.

Translocations spur population growth but fail to prevent genetic erosion in imperiled Florida Scrub-Jays.易位促进了种群增长，但未能阻止濒危的佛罗里达灌丛鸦的基因侵蚀。

Curr Biol. 2025 Mar 24;35(6):1391-1399.e6. doi: 10.1016/j.cub.2025.01.058. Epub 2025 Feb 27.

Kinship clustering within an ecologically diverse killer whale metapopulation.生态多样的虎鲸复合种群中的亲属关系聚类

Heredity (Edinb). 2025 Feb;134(2):109-119. doi: 10.1038/s41437-024-00740-y. Epub 2025 Jan 20.

Genomic Insights Into Red Squirrels in Scotland Reveal Loss of Heterozygosity Associated With Extreme Founder Effects.对苏格兰红松鼠的基因组洞察揭示了与极端奠基者效应相关的杂合性丧失。

Evol Appl. 2025 Jan 15;18(1):e70072. doi: 10.1111/eva.70072. eCollection 2025 Jan.

Divergence in Regulatory Regions and Gene Duplications May Underlie Chronobiological Adaptation in Desert Tortoises.调控区域的差异和基因复制可能是沙漠陆龟生物钟适应的基础。

Mol Ecol. 2025 Jan;34(2):e17600. doi: 10.1111/mec.17600. Epub 2024 Dec 3.

Rapid speciation in the holopelagic ctenophore following glacial recession.冰川消退后全浮游栉水母的快速物种形成。

bioRxiv. 2024 Nov 9:2024.10.10.617593. doi: 10.1101/2024.10.10.617593.

Prioritizing Conservation Areas for the Hyacinth Macaw () in Brazil From Low-Coverage Genomic Data.利用低覆盖度基因组数据为巴西蓝紫金刚鹦鹉（）确定优先保护区域

Evol Appl. 2024 Nov 18;17(11):e70039. doi: 10.1111/eva.70039. eCollection 2024 Nov.

本文引用的文献

Quantifying population genetic differentiation from next-generation sequencing data.从下一代测序数据中定量群体遗传分化。

Genetics. 2013 Nov;195(3):979-92. doi: 10.1534/genetics.113.154740. Epub 2013 Aug 26.

Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation.从 NGS 数据估算近交系数：对基因型调用和等位基因频率估计的影响。

Genome Res. 2013 Nov;23(11):1852-61. doi: 10.1101/gr.157388.113. Epub 2013 Aug 15.

A map of rice genome variation reveals the origin of cultivated rice.一张水稻基因组变异图谱揭示了栽培稻的起源。

Nature. 2012 Oct 25;490(7421):497-501. doi: 10.1038/nature11532. Epub 2012 Oct 3.

SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.从新一代测序数据中进行 SNP 调用、基因型调用和样本等位基因频率估计。

PLoS One. 2012;7(7):e37558. doi: 10.1371/journal.pone.0037558. Epub 2012 Jul 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验