Suppr超能文献

蝴蝶兰的从头转录组组装数据库

De novo transcriptome assembly databases for the butterfly orchid Phalaenopsis equestris.

作者信息

Niu Shan-Ce, Xu Qing, Zhang Guo-Qiang, Zhang Yong-Qiang, Tsai Wen-Chieh, Hsu Jui-Ling, Liang Chieh-Kai, Luo Yi-Bo, Liu Zhong-Jian

机构信息

State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China.

University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

Sci Data. 2016 Sep 27;3:160083. doi: 10.1038/sdata.2016.83.

Abstract

Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues representing the root, stem, leaf, flower buds, column, lip, petal, sepal and three developmental stages of seeds. Our aims were to contribute to a better understanding of the molecular mechanisms driving the analysed tissue characteristics and to enrich the available data for P. equestris. Here, we present three databases. The first dataset is the RNA-Seq raw reads, which can be used to execute new experiments with different analysis approaches. The other two datasets allow different types of searches for candidate homologues. The second dataset includes the sets of assembled unigenes and predicted coding sequences and proteins, enabling a sequence-based search. The third dataset consists of the annotation results of the aligned unigenes versus the Nonredundant (Nr) protein database, Kyoto Encyclopaedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) databases with low e-values, enabling a name-based search.

摘要

兰花以其绚丽的花朵和生态适应性而闻名。在对热带附生兰花蝴蝶兰的基因组进行测序后,我们结合了Illumina HiSeq2000进行RNA测序和Trinity进行从头组装,以表征代表根、茎、叶、花芽、蕊柱、唇瓣、花瓣、萼片以及种子三个发育阶段的11种不同蝴蝶兰组织的转录组。我们的目标是有助于更好地理解驱动所分析组织特征的分子机制,并丰富蝴蝶兰的现有数据。在此,我们展示了三个数据库。第一个数据集是RNA测序的原始读数,可用于采用不同分析方法开展新实验。另外两个数据集允许对候选同源物进行不同类型的搜索。第二个数据集包括组装的单基因集以及预测的编码序列和蛋白质,可进行基于序列的搜索。第三个数据集由比对后的单基因与低e值的非冗余(Nr)蛋白质数据库、京都基因与基因组百科全书(KEGG)以及直系同源簇(COG)数据库的注释结果组成,可进行基于名称的搜索。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验