利用长读长和双端测序数据构建茶树参考基因组并进行基因注释的改良。

The tea plant reference genome and improved gene annotation using long-read and paired-end sequencing data.

机构信息

State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, China.

BGI-Shenzhen, Shenzhen, 518083, China.

出版信息

Sci Data. 2019 Jul 15;6(1):122. doi: 10.1038/s41597-019-0127-1.

DOI:10.1038/s41597-019-0127-1

PMID:31308375

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6629666/

Abstract

Tea is a globally consumed non-alcohol beverage with great economic importance. However, lack of the reference genome has largely hampered the utilization of precious tea plant genetic resources towards breeding. To address this issue, we previously generated a high-quality reference genome of tea plant using Illumina and PacBio sequencing technology, which produced a total of 2,124 Gb short and 125 Gb long read data, respectively. A hybrid strategy was employed to assemble the tea genome that has been publicly released. We here described the data framework used to generate, annotate and validate the genome assembly. Besides, we re-predicted the protein-coding genes and annotated their putative functions using more comprehensive omics datasets with improved training models. We reassessed the assembly and annotation quality using the latest version of BUSCO. These data can be utilized to develop new methodologies/tools for better assembly of complex genomes, aid in finding of novel genes, variations and evolutionary clues associated with tea quality, thus help to breed new varieties with high yield and better quality in the future.

摘要

茶是一种全球消费的非酒精饮料，具有重要的经济意义。然而，由于缺乏参考基因组，极大地阻碍了宝贵的茶树遗传资源在培育方面的利用。为了解决这个问题，我们之前使用 Illumina 和 PacBio 测序技术生成了一个高质量的茶树参考基因组，分别产生了总计 2124GB 的短读和 125GB 的长读数据。我们采用了一种混合策略来组装已经公开的茶树基因组。在这里，我们描述了用于生成、注释和验证基因组组装的数据集框架。此外，我们使用更全面的组学数据集和改进的训练模型重新预测了蛋白质编码基因，并注释了它们的可能功能。我们使用最新版本的 BUSCO 重新评估了组装和注释的质量。这些数据可用于开发新的方法/工具，以更好地组装复杂基因组，帮助发现与茶叶质量相关的新基因、变异和进化线索，从而有助于未来培育出高产、优质的新品种。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4566/6629666/4c8b43bc1da3/41597_2019_127_Fig1_HTML.jpg

相似文献

The tea plant reference genome and improved gene annotation using long-read and paired-end sequencing data.

Sci Data. 2019 Jul 15;6(1):122. doi: 10.1038/s41597-019-0127-1.

Optimized sequencing depth and de novo assembler for deeply reconstructing the transcriptome of the tea plant, an economically important plant species.

BMC Bioinformatics. 2019 Nov 6;20(1):553. doi: 10.1186/s12859-019-3166-x.

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds.

BMC Genomics. 2011 Feb 28;12:131. doi: 10.1186/1471-2164-12-131.

The Reference Genome of Tea Plant and Resequencing of 81 Diverse Accessions Provide Insights into Its Genome Evolution and Adaptation.

Mol Plant. 2020 Jul 6;13(7):1013-1026. doi: 10.1016/j.molp.2020.04.010. Epub 2020 Apr 27.

Draft genome sequence of var. provides insights into the evolution of the tea genome and tea quality.

Proc Natl Acad Sci U S A. 2018 May 1;115(18):E4151-E4158. doi: 10.1073/pnas.1719622115. Epub 2018 Apr 20.

De novo transcriptome assembly of the wild relative of tea tree (Camellia taliensis) and comparative analysis with tea transcriptome identified putative genes associated with tea quality and stress response.

BMC Genomics. 2015 Apr 15;16(1):298. doi: 10.1186/s12864-015-1494-4.

Identification, characterization and utilization of unigene derived microsatellite markers in tea (Camellia sinensis L.).

BMC Plant Biol. 2009 May 11;9:53. doi: 10.1186/1471-2229-9-53.

Tea Plant Information Archive: a comprehensive genomics and bioinformatics platform for tea plant.

Plant Biotechnol J. 2019 Oct;17(10):1938-1953. doi: 10.1111/pbi.13111. Epub 2019 Apr 11.

Deciphering tea tree chloroplast and mitochondrial genomes of Camellia sinensis var. assamica.

Sci Data. 2019 Oct 17;6(1):209. doi: 10.1038/s41597-019-0201-8.

Genome-wide identification of conserved and novel microRNAs in one bud and two tender leaves of tea plant (Camellia sinensis) by small RNA sequencing, microarray-based hybridization and genome survey scaffold sequences.

BMC Plant Biol. 2017 Nov 21;17(1):212. doi: 10.1186/s12870-017-1169-1.

引用本文的文献

Genetic Diversity and Population Structure of Wild Ancient in Pu'an, Guizhou, China.

Plants (Basel). 2025 Jun 4;14(11):1709. doi: 10.3390/plants14111709.

Transcriptomics in the era of long-read sequencing.

Nat Rev Genet. 2025 Mar 28. doi: 10.1038/s41576-025-00828-z.

Evaluation of strategies for evidence-driven genome annotation using long-read RNA-seq.

Genome Res. 2025 Apr 14;35(4):1053-1064. doi: 10.1101/gr.279864.124.

Development of Genome-Wide Intron Length Polymorphism (ILP) Markers in Tea Plant () and Related Applications for Genetics Research.

Int J Mol Sci. 2024 Mar 13;25(6):3241. doi: 10.3390/ijms25063241.

From comparative gene content and gene order to ancestral contigs, chromosomes and karyotypes.

Sci Rep. 2023 Apr 13;13(1):6095. doi: 10.1038/s41598-023-33029-x.

Genome-Wide Identification of AMT2-Type Ammonium Transporters Reveal That CsAMT2.2 and CsAMT2.3 Potentially Regulate NH Absorption among Three Different Cultivars of .

Int J Mol Sci. 2022 Dec 10;23(24):15661. doi: 10.3390/ijms232415661.

United States tea: A synopsis of ongoing tea research and solutions to United States tea production issues.

Front Plant Sci. 2022 Sep 23;13:934651. doi: 10.3389/fpls.2022.934651. eCollection 2022.

Biosynthetic Pathway of Proanthocyanidins in Major Cash Crops.

Plants (Basel). 2021 Aug 28;10(9):1792. doi: 10.3390/plants10091792.

Genome-Wide Identification and Expression Patterns of the C2H2-Zinc Finger Gene Family Related to Stress Responses and Catechins Accumulation in [L.] O. Kuntze.

Int J Mol Sci. 2021 Apr 18;22(8):4197. doi: 10.3390/ijms22084197.

MAPK cascade gene family in Camellia sinensis: In-silico identification, expression profiles and regulatory network analysis.

BMC Genomics. 2020 Sep 7;21(1):613. doi: 10.1186/s12864-020-07030-x.

本文引用的文献

Tea Plant Information Archive: a comprehensive genomics and bioinformatics platform for tea plant.

Plant Biotechnol J. 2019 Oct;17(10):1938-1953. doi: 10.1111/pbi.13111. Epub 2019 Apr 11.

Draft genome sequence of var. provides insights into the evolution of the tea genome and tea quality.

Proc Natl Acad Sci U S A. 2018 May 1;115(18):E4151-E4158. doi: 10.1073/pnas.1719622115. Epub 2018 Apr 20.

Transcriptome Profiling Using Single-Molecule Direct RNA Sequencing Approach for In-depth Understanding of Genes in Secondary Metabolism Pathways of .

Front Plant Sci. 2017 Jul 11;8:1205. doi: 10.3389/fpls.2017.01205. eCollection 2017.

The Tea Tree Genome Provides Insights into Tea Flavor and Independent Evolution of Caffeine Biosynthesis.

Mol Plant. 2017 Jun 5;10(6):866-877. doi: 10.1016/j.molp.2017.04.002. Epub 2017 May 2.

Genetic Divergence between Camellia sinensis and Its Wild Relatives Revealed via Genome-Wide SNPs from RAD Sequencing.

PLoS One. 2016 Mar 10;11(3):e0151424. doi: 10.1371/journal.pone.0151424. eCollection 2016.

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Bioinformatics. 2015 Oct 1;31(19):3210-2. doi: 10.1093/bioinformatics/btv351. Epub 2015 Jun 9.

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.

Nat Biotechnol. 2015 Mar;33(3):290-5. doi: 10.1038/nbt.3122. Epub 2015 Feb 18.

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads.

Genome Res. 2014 Aug;24(8):1384-95. doi: 10.1101/gr.170720.113. Epub 2014 Apr 22.

Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.

PLoS One. 2012;7(11):e47768. doi: 10.1371/journal.pone.0047768. Epub 2012 Nov 21.

HaploMerger: reconstructing allelic relationships for polymorphic diploid genome assemblies.

Genome Res. 2012 Aug;22(8):1581-8. doi: 10.1101/gr.133652.111. Epub 2012 May 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用长读长和双端测序数据构建茶树参考基因组并进行基因注释的改良。

The tea plant reference genome and improved gene annotation using long-read and paired-end sequencing data.

机构信息

State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, China.

BGI-Shenzhen, Shenzhen, 518083, China.

出版信息

Sci Data. 2019 Jul 15;6(1):122. doi: 10.1038/s41597-019-0127-1.

DOI:10.1038/s41597-019-0127-1

PMID:31308375

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6629666/

Abstract

摘要

利用长读长和双端测序数据构建茶树参考基因组并进行基因注释的改良。

The tea plant reference genome and improved gene annotation using long-read and paired-end sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用长读长和双端测序数据构建茶树参考基因组并进行基因注释的改良。

The tea plant reference genome and improved gene annotation using long-read and paired-end sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献