HACSim：一个用于使用单倍型积累曲线估计遗传多样性评估的种内样本量的R包。

HACSim: an R package to estimate intraspecific sample sizes for genetic diversity assessment using haplotype accumulation curves.

作者信息

Phillips Jarrett D, French Steven H, Hanner Robert H, Gillis Daniel J

机构信息

School of Computer Science, University of Guelph, Guelph, Ontario, Canada.

Department of Integrative Biology, Biodiversity Institute of Ontario, University of Guelph, Guelph, Ontario, Canada.

出版信息

PeerJ Comput Sci. 2020 Jan 6;6:e243. doi: 10.7717/peerj-cs.243. eCollection 2020.

DOI:10.7717/peerj-cs.243

PMID:33816897

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7924493/

Abstract

Assessing levels of standing genetic variation within species requires a robust sampling for the purpose of accurate specimen identification using molecular techniques such as DNA barcoding; however, statistical estimators for what constitutes a robust sample are currently lacking. Moreover, such estimates are needed because most species are currently represented by only one or a few sequences in existing databases, which can safely be assumed to be undersampled. Unfortunately, sample sizes of 5-10 specimens per species typically seen in DNA barcoding studies are often insufficient to adequately capture within-species genetic diversity. Here, we introduce a novel iterative extrapolation simulation algorithm of haplotype accumulation curves, called HACSim (aplotype ccumulation urve ulator) that can be employed to calculate likely sample sizes needed to observe the full range of DNA barcode haplotype variation that exists for a species. Using uniform haplotype and non-uniform haplotype frequency distributions, the notion of sampling sufficiency (the sample size at which sampling accuracy is maximized and above which no new sampling information is likely to be gained) can be gleaned. HACSim can be employed in two primary ways to estimate specimen sample sizes: (1) to simulate haplotype sampling in hypothetical species, and (2) to simulate haplotype sampling in real species mined from public reference sequence databases like the Barcode of Life Data Systems (BOLD) or GenBank for any genomic marker of interest. While our algorithm is globally convergent, runtime is heavily dependent on initial sample sizes and skewness of the corresponding haplotype frequency distribution.

摘要

评估物种内现存遗传变异水平需要进行充分采样，以便利用DNA条形码等分子技术准确鉴定标本；然而，目前尚缺乏关于构成充分样本的统计估计方法。此外，之所以需要这样的估计，是因为在现有数据库中，大多数物种目前仅由一个或几个序列代表，可以肯定地认为这些样本采样不足。不幸的是，DNA条形码研究中常见的每个物种5 - 10个标本的样本量往往不足以充分捕捉物种内的遗传多样性。在此，我们引入一种新的单倍型累积曲线迭代外推模拟算法，称为HACSim（单倍型累积曲线模拟器），可用于计算观察一个物种完整的DNA条形码单倍型变异范围所需的可能样本量。利用均匀单倍型和非均匀单倍型频率分布，可以得出采样充足性的概念（即采样精度最大化且超过此样本量不太可能获得新采样信息时的样本量）。HACSim可通过两种主要方式用于估计标本样本量：（1）模拟假设物种中的单倍型采样，以及（2）模拟从公共参考序列数据库（如生命条形码数据系统（BOLD）或GenBank）中挖掘的真实物种中针对任何感兴趣的基因组标记的单倍型采样。虽然我们的算法全局收敛，但运行时间在很大程度上取决于初始样本量和相应单倍型频率分布的偏度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cd1/7924493/d2e776072721/peerj-cs-06-243-g001.jpg

相似文献

HACSim: an R package to estimate intraspecific sample sizes for genetic diversity assessment using haplotype accumulation curves.HACSim：一个用于使用单倍型积累曲线估计遗传多样性评估的种内样本量的R包。

PeerJ Comput Sci. 2020 Jan 6;6:e243. doi: 10.7717/peerj-cs.243. eCollection 2020.

Incomplete estimates of genetic diversity within species: Implications for DNA barcoding.物种内遗传多样性的不完整估计：对DNA条形码的影响。

Ecol Evol. 2019 Feb 16;9(5):2996-3010. doi: 10.1002/ece3.4757. eCollection 2019 Mar.

The impact of genetic diversity on the accuracy of DNA barcoding to identify species: A study on the genus .遗传多样性对DNA条形码识别物种准确性的影响：关于……属的一项研究

Ecol Evol. 2019 Aug 20;9(18):10723-10733. doi: 10.1002/ece3.5590. eCollection 2019 Sep.

A simulation study of sample size for DNA barcoding.DNA条形码样本量的模拟研究。

Ecol Evol. 2015 Dec 1;5(24):5869-79. doi: 10.1002/ece3.1846. eCollection 2015 Dec.

DNA barcoding allows identification of European Fanniidae (Diptera) of forensic interest.DNA条形码技术可用于鉴定具有法医意义的欧洲厕蝇科（双翅目）昆虫。

Forensic Sci Int. 2017 Sep;278:106-114. doi: 10.1016/j.forsciint.2017.06.023. Epub 2017 Jun 29.

DNA barcoding and phylogenetics of freshwater fish fauna of Ranganadi River, Arunachal Pradesh.阿鲁纳恰尔邦朗加纳迪河淡水鱼类区系的DNA条形码与系统发育学

Gene. 2020 Sep 5;754:144860. doi: 10.1016/j.gene.2020.144860. Epub 2020 Jun 10.

Estimating sample sizes for DNA barcoding.估算 DNA 条形码的样本大小。

Mol Phylogenet Evol. 2010 Mar;54(3):1035-9. doi: 10.1016/j.ympev.2009.09.014. Epub 2009 Sep 15.

DNA Barcoding of an Assembly of Montane Andean Butterflies (Satyrinae): Geographical Scale and Identification Performance.安第斯山地蝴蝶（眼蝶亚科）群体的DNA条形码：地理尺度与鉴定性能

Neotrop Entomol. 2017 Oct;46(5):514-523. doi: 10.1007/s13744-016-0481-z. Epub 2017 Jan 23.

DNA barcoding of southern African crustaceans reveals a mix of invasive species and potential cryptic diversity.南非甲壳动物的 DNA 条形码揭示了入侵物种和潜在隐存多样性的混合。

PLoS One. 2019 Sep 16;14(9):e0222047. doi: 10.1371/journal.pone.0222047. eCollection 2019.

Selection of marker genes for genetic barcoding of microorganisms and binning of metagenomic reads by Barcoder software tools.微生物遗传条形码标记基因的选择和 Barcoder 软件工具对宏基因组读段的分类。

BMC Bioinformatics. 2018 Aug 30;19(1):309. doi: 10.1186/s12859-018-2320-1.

引用本文的文献

: An R package for the analysis of very low frequency variants in DNA sequences.用于分析DNA序列中极低频率变异的R软件包。

Biodivers Data J. 2023 Jan 26;11:e96480. doi: 10.3897/BDJ.11.e96480. eCollection 2023.

Phylogeography of the Sunda pangolin, : Implications for taxonomy, conservation management and wildlife forensics.巽他穿山甲的系统发育地理学：对分类学、保护管理和野生动物法医鉴定的启示

Ecol Evol. 2023 Aug 15;13(8):e10373. doi: 10.1002/ece3.10373. eCollection 2023 Aug.

Genetic population dynamics of the critically endangered scalloped hammerhead shark () in the Eastern Tropical Pacific.东热带太平洋极度濒危的扇形锤头鲨（）的遗传种群动态。

Ecol Evol. 2022 Dec 28;12(12):e9642. doi: 10.1002/ece3.9642. eCollection 2022 Dec.

Opportunities and challenges of macrogenetic studies.宏观遗传学研究的机遇与挑战。

Nat Rev Genet. 2021 Dec;22(12):791-807. doi: 10.1038/s41576-021-00394-0. Epub 2021 Aug 18.

Genetic diversity of the Nubian ibex in Oman as revealed by mitochondrial DNA.线粒体DNA揭示阿曼努比亚羱羊的遗传多样性

R Soc Open Sci. 2021 May 26;8(5):210125. doi: 10.1098/rsos.210125.

Application of deep autoencoder as an one-class classifier for unsupervised network intrusion detection: a comparative evaluation.深度自动编码器作为无监督网络入侵检测的单类分类器的应用：一项比较评估。

PeerJ Comput Sci. 2020 Dec 7;6:e327. doi: 10.7717/peerj-cs.327. eCollection 2020.

A DNA barcode-based survey of wild urban bees in the Loire Valley, France.基于 DNA 条形码的法国卢瓦尔河谷野生城市蜜蜂调查。

Sci Rep. 2021 Feb 26;11(1):4770. doi: 10.1038/s41598-021-83631-0.

本文引用的文献

From metabarcoding to metaphylogeography: separating the wheat from the chaff.从代谢条形码到系统地理学：去芜存菁。

Ecol Appl. 2020 Mar;30(2):e02036. doi: 10.1002/eap.2036. Epub 2019 Dec 11.

Incomplete estimates of genetic diversity within species: Implications for DNA barcoding.物种内遗传多样性的不完整估计：对DNA条形码的影响。

Ecol Evol. 2019 Feb 16;9(5):2996-3010. doi: 10.1002/ece3.4757. eCollection 2019 Mar.

Beyond Biodiversity: Can Environmental DNA (eDNA) Cut It as a Population Genetics Tool?超越生物多样性：环境 DNA（eDNA）能否成为种群遗传学工具？

Genes (Basel). 2019 Mar 1;10(3):192. doi: 10.3390/genes10030192.

Metabarcoding a diverse arthropod mock community.对多样的节肢动物模拟群落进行代谢条形码分析。

Mol Ecol Resour. 2019 May;19(3):711-727. doi: 10.1111/1755-0998.13008.

Estimating intraspecific genetic diversity from community DNA metabarcoding data.从群落DNA宏条形码数据估计种内遗传多样性。

PeerJ. 2018 Apr 9;6:e4644. doi: 10.7717/peerj.4644. eCollection 2018.

Evaluating sampling strategy for DNA barcoding study of coastal and inland halo-tolerant Poaceae and Chenopodiaceae: A case study for increased sample size.评估海岸和内陆耐盐禾本科及藜科植物DNA条形码研究的抽样策略：增加样本量的案例研究

PLoS One. 2017 Sep 21;12(9):e0185311. doi: 10.1371/journal.pone.0185311. eCollection 2017.

DNA analysis of traded shark fins and mobulid gill plates reveals a high proportion of species of conservation concern.对交易的鲨鱼鳍和鳐鱼鳃板进行的 DNA 分析显示，具有保护意义的物种比例很高。

Sci Rep. 2017 Aug 25;7(1):9505. doi: 10.1038/s41598-017-10123-5.

Barcode-based species delimitation in the marine realm: a test using Hexanauplia (Multicrustacea: Thecostraca and Copepoda).海洋领域基于条形码的物种界定：以六肢幼体（多甲纲：鞘甲亚纲和桡足亚纲）为例的测试

Genome. 2017 Feb;60(2):169-182. doi: 10.1139/gen-2015-0209. Epub 2016 Oct 21.

DNA barcodes identify medically important tick species in Canada.DNA条形码识别加拿大具有医学重要性的蜱虫物种。

Genome. 2017 Jan;60(1):74-84. doi: 10.1139/gen-2015-0179. Epub 2016 Jul 19.

VSEARCH: a versatile open source tool for metagenomics.VSEARCH：一款用于宏基因组学的多功能开源工具。

PeerJ. 2016 Oct 18;4:e2584. doi: 10.7717/peerj.2584. eCollection 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

HACSim：一个用于使用单倍型积累曲线估计遗传多样性评估的种内样本量的R包。

HACSim: an R package to estimate intraspecific sample sizes for genetic diversity assessment using haplotype accumulation curves.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献