Mochida Keiichi, Yoshida Takuhiro, Sakurai Tetsuya, Ogihara Yasunari, Shinozaki Kazuo
Plant Science Center, RIKEN, Yokohama 230-0045, Japan.
Plant Physiol. 2009 Jul;150(3):1135-46. doi: 10.1104/pp.109.138214. Epub 2009 May 15.
The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp/.
小麦族全长CDS数据库(TriFLDB)包含有关小麦族作物小麦(Triticum aestivum)和大麦(Hordeum vulgare)全长编码序列(CDS)的可用信息,包括功能注释和比较基因组学特征。TriFLDB提供了一个使用基因功能关键词和相关基因本体术语的搜索界面,以及对DNA和推导的翻译氨基酸序列的相似性搜索,以访问小麦族全长CDS(TriFLCDS)条目的注释。注释包括针对多个序列数据库的相似性搜索结果以及通过InterProScan进行的结构域结构预测。TriFLDB中推导的氨基酸序列通过序列同一性的逐步阈值进行层次聚类,与拟南芥(Arabidopsis thaliana)、水稻(Oryza sativa)和高粱(Sorghum bicolor)的蛋白质组数据集分组,提供基于全长蛋白质序列的层次聚类结果。该数据库还基于TriFLCDS与水稻和高粱基因组序列的比较图谱提供序列相似性结果,这些结果与当前注释一起可用于预测TriFLCDS条目的基因结构。为了提供全长CDS的可能遗传位置,TriFLCDS条目还被分配到目前保存在小麦族定位EST数据库中的大麦和二倍体小麦的遗传定位cDNA序列。这些相关数据可从两个数据库的搜索界面进行搜索。当前的TriFLDB包含来自大麦和小麦的15,871个全长CDS,包括大麦和小麦的推定全长cDNA,可公开获取。这些信息内容为小麦族基因组学和禾本科比较基因组学提供了一个信息学网关。TriFLDB可在http://TriFLDB.psc.riken.jp/上公开获取。