Department of Engineering Science, National Cheng Kung University, Tainan 701, Taiwan.
Plant Cell Physiol. 2011 Feb;52(2):238-43. doi: 10.1093/pcp/pcq201. Epub 2011 Jan 17.
Orchids are one of the most ecological and evolutionarily significant plants, and the Orchidaceae is one of the most abundant families of the angiosperms. Genetic databases will be useful not only for gene discovery but also for future genomic annotation. For this purpose, OrchidBase was established from 37,979,342 sequence reads collected from 11 in-house Phalaenopsis orchid cDNA libraries. Among them, 41,310 expressed sequence tags (ESTs) were obtained by using Sanger sequencing, whereas 37,908,032 reads were obtained by using next-generation sequencing (NGS) including both Roche 454 and Solexa Illumina sequencers. These reads were assembled into 8,501 contigs and 76,116 singletons, resulting in 84,617 non-redundant transcribed sequences with an average length of 459 bp. The analysis pipeline of the database is an automated system written in Perl and C#, and consists of the following components: automatic pre-processing of EST reads, assembly of raw sequences, annotation of the assembled sequences and storage of the analyzed information in SQL databases. A web application was implemented with HTML and a Microsoft .NET Framework C# program for browsing and querying the database, creating dynamic web pages on the client side, analyzing gene ontology (GO) and mapping annotated enzymes to KEGG pathways. The online resources for putative annotation can be searched either by text or by using BLAST, and the results can be explored on the website and downloaded. Consequently, the establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology. The OrchidBase database is freely available at http://lab.fhes.tn.edu.tw/est.
兰花是生态和进化意义上最重要的植物之一,兰科是被子植物中最丰富的科之一。基因数据库不仅对基因发现有用,而且对未来的基因组注释也有用。为此,从 11 个内部蝴蝶兰 cDNA 文库中收集了 37979342 条序列读段,建立了 OrchidBase。其中,通过 Sanger 测序获得了 41310 个表达序列标签(EST),通过下一代测序(NGS)获得了 37908032 个读段,包括 Roche 454 和 Solexa Illumina 测序仪。这些读段组装成 8501 个 contigs 和 76116 个 singletons,生成了 84617 个非冗余转录序列,平均长度为 459bp。数据库的分析管道是一个用 Perl 和 C#编写的自动化系统,由以下组件组成:EST 读段的自动预处理、原始序列的组装、组装序列的注释以及分析信息在 SQL 数据库中的存储。使用 HTML 和 Microsoft.NET Framework C#程序实现了一个 Web 应用程序,用于浏览和查询数据库,在客户端创建动态网页,分析基因本体论(GO)并将注释酶映射到 KEGG 途径。可以通过文本或 BLAST 搜索在线资源进行推测性注释,并且可以在网站上探索和下载结果。因此,OrchidBase 的建立将为研究人员提供一个高质量的遗传资源,用于数据挖掘,并促进兰花生物学和生物技术的高效实验研究。OrchidBase 数据库可在 http://lab.fhes.tn.edu.tw/est 免费获得。