一种用于 RNA-seq 数据的高通量 SNP 发现策略。

A high-throughput SNP discovery strategy for RNA-seq data.

机构信息

Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Zijingang Campus, Hangzhou, China.

出版信息

BMC Genomics. 2019 Feb 27;20(1):160. doi: 10.1186/s12864-019-5533-4.

DOI:10.1186/s12864-019-5533-4

PMID:30813897

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6391812/

Abstract

BACKGROUND

Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of next generation sequencing (NGS) provides a high-throughput means of SNP discovery. However, SNP development is limited by the availability of reliable SNP discovery methods. Especially, the optimum assembler and SNP caller for accurate SNP prediction from next generation sequencing data are not known.

RESULTS

Herein we performed SNP prediction based on RNA-seq data of peach and mandarin peel tissue under a comprehensive comparison of two paired-end read lengths (125 bp and 150 bp), five assemblers (Trinity, IDBA, oases, SOAPdenovo, Trans-abyss) and two SNP callers (GATK and GBS). The predicted SNPs were compared with the authentic SNPs identified via PCR amplification followed by gene cloning and sequencing procedures. A total of 40 and 240 authentic SNPs were presented in five anthocyanin biosynthesis related genes in peach and in nine carotenogenic genes in mandarin. Putative SNPs predicted from the same RNA-seq data with different strategies led to quite divergent results. The rate of false positive SNPs was significantly lower when the paired-end read length was 150 bp compared with 125 bp. Trinity was superior to the other four assemblers and GATK was substantially superior to GBS due to a low rate of missing authentic SNPs. The combination of assembler Trinity, SNP caller GATK, and the paired-end read length 150 bp had the best performance in SNP discovery with 100% accuracy both in peach and in mandarin cases. This strategy was applied to the characterization of SNPs in peach and mandarin transcriptomes.

CONCLUSIONS

Through comparison of authentic SNPs obtained by PCR cloning strategy and putative SNPs predicted from different combinations of five assemblers, two SNP callers, and two paired-end read lengths, we provided a reliable and efficient strategy, Trinity-GATK with 150 bp paired-end read length, for SNP discovery from RNA-seq data. This strategy discovered SNP at 100% accuracy in peach and mandarin cases and might be applicable to a wide range of plants and other organisms.

摘要

背景

单核苷酸多态性（SNP）已被用作遗传学和育种研究中的重要分子标记。下一代测序（NGS）的快速发展提供了一种高通量的 SNP 发现方法。然而，SNP 的开发受到可靠的 SNP 发现方法的限制。特别是，用于从下一代测序数据中准确预测 SNP 的最佳组装程序和 SNP 调用程序尚不清楚。

结果

在此，我们通过比较两种不同的双端读长（125bp 和 150bp）、五种组装程序（Trinity、IDBA、oases、SOAPdenovo、Trans-abyss）和两种 SNP 调用程序（GATK 和 GBS），在桃和柑橘皮组织的 RNA-seq 数据上进行了 SNP 预测。将预测的 SNP 与通过 PCR 扩增、基因克隆和测序程序鉴定的真实 SNP 进行比较。在五个花青素生物合成相关基因和九个类胡萝卜素生物合成基因中，共鉴定出 40 个和 240 个真实 SNP。使用不同策略从相同的 RNA-seq 数据中预测的假定 SNP 导致了截然不同的结果。与 125bp 相比，150bp 双端读长的假阳性 SNP 率显著降低。Trinity 优于其他四个组装程序，GATK 由于真实 SNP 缺失率低，明显优于 GBS。组装程序 Trinity、SNP 调用程序 GATK 和 150bp 双端读长的组合在 SNP 发现方面表现最佳，在桃和柑橘的情况下准确率均为 100%。该策略应用于桃和柑橘转录组中 SNP 的表征。

结论

通过比较通过 PCR 克隆策略获得的真实 SNP 和从五种组装程序、两种 SNP 调用程序和两种双端读长的不同组合中预测的假定 SNP，我们提供了一种可靠且高效的策略，即使用 Trinity-GATK 和 150bp 双端读长进行 SNP 发现。该策略在桃和柑橘的情况下 SNP 发现准确率为 100%，可能适用于广泛的植物和其他生物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b50/6391812/5e0dd58c7473/12864_2019_5533_Fig1_HTML.jpg

相似文献

A high-throughput SNP discovery strategy for RNA-seq data.

BMC Genomics. 2019 Feb 27;20(1):160. doi: 10.1186/s12864-019-5533-4.

Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology.

PLoS One. 2017 Feb 24;12(2):e0172687. doi: 10.1371/journal.pone.0172687. eCollection 2017.

Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology.

PLoS One. 2016 Sep 8;11(9):e0161370. doi: 10.1371/journal.pone.0161370. eCollection 2016.

Gene-based SNP identification and validation in soybean using next-generation transcriptome sequencing.

Mol Genet Genomics. 2018 Jun;293(3):623-633. doi: 10.1007/s00438-017-1410-5. Epub 2017 Dec 27.

An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.

BMC Bioinformatics. 2015 Nov 11;16:382. doi: 10.1186/s12859-015-0801-z.

Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.

PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016.

Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.

Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625.

The impact of read length on quantification of differentially expressed genes and splice junction detection.

Genome Biol. 2015 Jun 23;16(1):131. doi: 10.1186/s13059-015-0697-y.

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study.

BMC Bioinformatics. 2011 Dec 14;12 Suppl 14(Suppl 14):S2. doi: 10.1186/1471-2105-12-S14-S2.

Development and evaluation of a 9K SNP array for peach by internationally coordinated SNP detection and validation in breeding germplasm.

PLoS One. 2012;7(4):e35668. doi: 10.1371/journal.pone.0035668. Epub 2012 Apr 20.

引用本文的文献

Identification of Genetic Relationships and Group Structure Analysis of Yanqi Horses.

Genes (Basel). 2025 Feb 27;16(3):294. doi: 10.3390/genes16030294.

Expression quantitative trait loci associated with performance traits, blood biochemical parameters, and cytokine profile in pigs.

Front Genet. 2025 Mar 5;16:1533424. doi: 10.3389/fgene.2025.1533424. eCollection 2025.

Investigating the functional and structural effect of non-synonymous single nucleotide polymorphisms in the cytotoxic T-lymphocyte antigen-4 gene: An in-silico study.

PLoS One. 2025 Jan 24;20(1):e0316465. doi: 10.1371/journal.pone.0316465. eCollection 2025.

Integrating dynamic high-throughput phenotyping and genetic analysis to monitor growth variation in foxtail millet.

Plant Methods. 2024 Nov 5;20(1):168. doi: 10.1186/s13007-024-01295-z.

Molecular targets and strategies in the development of nucleic acid cancer vaccines: from shared to personalized antigens.

J Biomed Sci. 2024 Oct 9;31(1):94. doi: 10.1186/s12929-024-01082-x.

Identification of Single-Nucleotide Polymorphisms in Differentially Expressed Genes Favoring Soybean Meal Tolerance in Higher-Growth Zebrafish (Danio rerio).

Mar Biotechnol (NY). 2024 Aug;26(4):754-765. doi: 10.1007/s10126-024-10343-7. Epub 2024 Jul 3.

Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review.

JMIR Bioinform Biotechnol. 2024 May 27;5:e54332. doi: 10.2196/54332.

Transcriptome variations in hybrids of wild emmer wheat (Triticum turgidum ssp. dicoccoides).

BMC Plant Biol. 2024 Jun 18;24(1):571. doi: 10.1186/s12870-024-05258-3.

In silico functional, structural and pathogenicity analysis of missense single nucleotide polymorphisms in human MCM6 gene.

Sci Rep. 2024 May 21;14(1):11607. doi: 10.1038/s41598-024-62299-2.

Verification of Key Target Molecules for Intramuscular Fat Deposition and Screening of SNP Sites in Sheep from Small-Tail Han Sheep Breed and Its Cross with Suffolk.

Int J Mol Sci. 2024 Mar 3;25(5):2951. doi: 10.3390/ijms25052951.

本文引用的文献

Genomics of the origin and evolution of Citrus.

Nature. 2018 Feb 15;554(7692):311-316. doi: 10.1038/nature25447. Epub 2018 Feb 7.

Trends in plant research using molecular markers.

Planta. 2018 Mar;247(3):543-557. doi: 10.1007/s00425-017-2829-y. Epub 2017 Dec 14.

Differential Sensitivity of Fruit Pigmentation to Ultraviolet Light between Two Peach Cultivars.

Front Plant Sci. 2017 Sep 8;8:1552. doi: 10.3389/fpls.2017.01552. eCollection 2017.

Optimizing Hybrid de Novo Transcriptome Assembly and Extending Genomic Resources for Giant Freshwater Prawns (Macrobrachium rosenbergii): The Identification of Genes and Markers Associated with Reproduction.

Int J Mol Sci. 2016 May 7;17(5):690. doi: 10.3390/ijms17050690.

GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data.

BMC Bioinformatics. 2016 Jan 12;17:29. doi: 10.1186/s12859-016-0879-y.

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.

BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

Distinct Carotenoid and Flavonoid Accumulation in a Spontaneous Mutant of Ponkan (Citrus reticulata Blanco) Results in Yellowish Fruit and Enhanced Postharvest Resistance.

J Agric Food Chem. 2015 Sep 30;63(38):8601-14. doi: 10.1021/acs.jafc.5b02807. Epub 2015 Sep 21.

The impact of read length on quantification of differentially expressed genes and splice junction detection.

Genome Biol. 2015 Jun 23;16(1):131. doi: 10.1186/s13059-015-0697-y.

OTG-snpcaller: an optimized pipeline based on TMAP and GATK for SNP calling from ion torrent data.

PLoS One. 2014 May 13;9(5):e97507. doi: 10.1371/journal.pone.0097507. eCollection 2014.

The impacts of read length and transcriptome complexity for de novo assembly: a simulation study.

PLoS One. 2014 Apr 15;9(4):e94825. doi: 10.1371/journal.pone.0094825. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于 RNA-seq 数据的高通量 SNP 发现策略。

A high-throughput SNP discovery strategy for RNA-seq data.

机构信息

Zhejiang Provincial Key Laboratory of Horticultural Plant Integrative Biology, Zhejiang University, Zijingang Campus, Hangzhou, China.