利用基因组资源和来自模式海洋哺乳动物的短读序列数据对 RAD 基因座频率进行计算机预测的实验验证。

Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal.

机构信息

Department of Animal Behavior, University of Bielefeld, Postfach 100131, 33615, Bielefeld, Germany.

British Antarctic Survey, High Cross, Madingley Road, Cambridge, CB3 OET, UK.

出版信息

BMC Genomics. 2019 Jan 22;20(1):72. doi: 10.1186/s12864-019-5440-8.

DOI:10.1186/s12864-019-5440-8

PMID:30669975

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6341687/

Abstract

BACKGROUND

Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies.

RESULTS

PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion.

CONCLUSIONS

Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organism's genome.

摘要

背景

限制酶相关 DNA 测序（RADseq）通过允许对数千个基因座进行经济高效的基因分型，彻底改变了对野生生物的研究。然而，对于缺乏参考基因组的物种来说，选择既能获得最多 RAD 基因座数量又能保证覆盖深度的限制酶是具有挑战性的，而这对成功的结果至关重要。为了解决这个问题，最近开发了 PredRAD，它使用概率模型根据 GC 含量或单、二或三核苷酸组成，从转录组组装或其他序列资源中预测限制酶位点频率。该程序生成的预测结果与通过对现有参考基因组组装进行计算机消化获得的真实限制酶位点数量的估计大致一致。然而，在实践中，实际获得的基因座数量可能会有所不同，因为不完全的酶消化或基因组覆盖不均匀可能导致某些基因座在 RAD 数据集上没有被代表，而错误的组装可能会潜在地增加基因座数量。为了研究这一点，我们使用基因组和转录组组装以及来自南极毛皮海豹（Arctocephalus gazella）的 RADseq 数据，将 PredRAD 预测与通过计算机消化获得的基因座数量的经验估计值以及从头组装进行比较。

结果

PredRAD 对转录组组装的预测结果始终高于对基因组组装的预测结果，三核苷酸和二核苷酸模型的预测频率也高于单核苷酸或 GC 含量模型。总体而言，分别应用于转录组和基因组组装的二核苷酸和三核苷酸模型生成的预测结果与通过计算机消化估计的限制酶位点数量最接近。此外，从头组装的 RAD 基因座映射到限制酶位点的数量与基于计算机消化的预期数量相似。

结论

我们的研究表明，PredRAD 预测与 RAD 基因座数量的经验估计之间存在高度一致性。这进一步支持了 PredRAD 的实用性，同时也表明，对生物体基因组中存在的大多数 RAD 基因座进行测序和组装是可行的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/241d/6341687/9e8206ad2499/12864_2019_5440_Fig1_HTML.jpg

相似文献

Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal.

BMC Genomics. 2019 Jan 22;20(1):72. doi: 10.1186/s12864-019-5440-8.

Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life.

Genome Biol Evol. 2015 Nov 3;7(12):3207-25. doi: 10.1093/gbe/evv210.

RAD Sequencing and a Hybrid Antarctic Fur Seal Genome Assembly Reveal Rapidly Decaying Linkage Disequilibrium, Global Population Structure and Evidence for Inbreeding.

G3 (Bethesda). 2018 Jul 31;8(8):2709-2722. doi: 10.1534/g3.118.200171.

Gene discovery in the Antarctic fur seal (Arctocephalus gazella) skin transcriptome.

Mol Ecol Resour. 2011 Jul;11(4):703-10. doi: 10.1111/j.1755-0998.2011.02999.x. Epub 2011 Mar 16.

A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them.

Mol Ecol Resour. 2016 Jul;16(4):909-21. doi: 10.1111/1755-0998.12502. Epub 2016 Jan 20.

Breaking RAD: an evaluation of the utility of restriction site-associated DNA sequencing for genome scans of adaptation.

Mol Ecol Resour. 2017 Mar;17(2):142-152. doi: 10.1111/1755-0998.12635. Epub 2016 Dec 16.

Challenges and advances for transcriptome assembly in non-model species.

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

Comparative performance of transcriptome assembly methods for non-model organisms.

BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.

A novel approach for mining polymorphic microsatellite markers in silico.

PLoS One. 2011;6(8):e23283. doi: 10.1371/journal.pone.0023283. Epub 2011 Aug 10.

Genetic sex assignment in wild populations using genotyping-by-sequencing data: A statistical threshold approach.

Mol Ecol Resour. 2018 Mar;18(2):179-190. doi: 10.1111/1755-0998.12767. Epub 2018 Mar 3.

引用本文的文献

Differentiation of European yellow rust subraces within the 'Warrior(-)' genetic group.

PLoS One. 2025 May 23;20(5):e0323046. doi: 10.1371/journal.pone.0323046. eCollection 2025.

Genetic variability of the 16S rRNA gene of Nocardia brasiliensis, the most common causative agent of actinomycetoma in Latin America and the Caribbean.

Rev Inst Med Trop Sao Paulo. 2023 Apr 14;65:e31. doi: 10.1590/S1678-9946202365031. eCollection 2023.

Reduced metagenome sequencing for strain-resolution taxonomic profiles.

Microbiome. 2021 Mar 29;9(1):79. doi: 10.1186/s40168-021-01019-8.

本文引用的文献

RAD Sequencing and a Hybrid Antarctic Fur Seal Genome Assembly Reveal Rapidly Decaying Linkage Disequilibrium, Global Population Structure and Evidence for Inbreeding.

G3 (Bethesda). 2018 Jul 31;8(8):2709-2722. doi: 10.1534/g3.118.200171.

Developing genome-wide SNPs and constructing an ultrahigh-density linkage map in oil palm.

Sci Rep. 2018 Jan 12;8(1):691. doi: 10.1038/s41598-017-18613-2.

Population genomic footprints of host adaptation, introgression and recombination in coffee leaf rust.

Mol Plant Pathol. 2018 Jul;19(7):1742-1753. doi: 10.1111/mpp.12657. Epub 2018 Feb 22.

De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species.

BMC Genomics. 2018 Jan 8;19(1):32. doi: 10.1186/s12864-017-4379-x.

RAD sequencing resolves fine-scale population structure in a benthic invertebrate: implications for understanding phenotypic plasticity.

R Soc Open Sci. 2017 Feb 8;4(2):160548. doi: 10.1098/rsos.160548. eCollection 2017 Feb.

Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

BMC Res Notes. 2016 Aug 26;9(1):418. doi: 10.1186/s13104-016-2209-x.

Next-generation biology: Sequencing and data analysis approaches for non-model organisms.

Mar Genomics. 2016 Dec;30:3-13. doi: 10.1016/j.margen.2016.04.012. Epub 2016 May 13.

Characterization of the mantle transcriptome in bivalves: Pecten maximus, Mytilus edulis and Crassostrea gigas.

Mar Genomics. 2016 Jun;27:9-15. doi: 10.1016/j.margen.2016.04.003. Epub 2016 May 7.

Harnessing the power of RADseq for ecological and evolutionary genomics.

Nat Rev Genet. 2016 Feb;17(2):81-92. doi: 10.1038/nrg.2015.28. Epub 2016 Jan 5.

Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life.

Genome Biol Evol. 2015 Nov 3;7(12):3207-25. doi: 10.1093/gbe/evv210.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用基因组资源和来自模式海洋哺乳动物的短读序列数据对 RAD 基因座频率进行计算机预测的实验验证。

Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献