Gopalan Vivek, Tan Tin Wee, Lee Bernett T K, Ranganathan Shoba
Department of Biochemistry, National University of Singapore, Singapore 119260.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D59-63. doi: 10.1093/nar/gkh051.
Xpro is a relational database that contains all the eukaryotic protein-encoding DNA sequences contained in GenBank with associated data required for the analysis of eukaryotic gene architecture. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein-coding regions, Xpro provides annotations on the splice sites and intron phases. Furthermore, Xpro validates intron positions using alignment information between the record's sequence and EST sequences found in dbEST. In the process of validation, alternative splicing information is also obtained and can be found in the database. The intron-containing genes in the Xpro are also classified as experimental or predicted based on the intron position validation and specific keywords in the GenBank records that are present in predicted genes. An Entrez-like query system, which is familiar to most biologists, is provided for accessing the information present in the database system. A non-redundant set of Xpro database contents is also obtained by cross-referencing to the Swiss-Prot/TrEMBL and Pfam databases. The database currently contains information for 493,983 genes--351,918 intron- containing genes and 142,065 intron-less genes. Xpro is updated for each new GenBank release and is freely available via the internet at http://origin.bic. nus.edu.sg/xpro.
Xpro是一个关系数据库,它包含GenBank中所有真核生物蛋白质编码DNA序列以及真核基因结构分析所需的相关数据。除了GenBank记录中包含的信息(包括序列、位置、长度以及关于内含子、外显子和蛋白质编码区域的描述等属性)外,Xpro还提供了剪接位点和内含子相位的注释。此外,Xpro利用记录序列与dbEST中发现的EST序列之间的比对信息来验证内含子位置。在验证过程中,还会获取可变剪接信息并可在数据库中找到。Xpro中含内含子的基因也根据内含子位置验证以及GenBank记录中预测基因所特有的关键词被分类为实验性或预测性的。提供了一个大多数生物学家都熟悉的类似Entrez的查询系统,用于访问数据库系统中存在的信息。通过与Swiss-Prot/TrEMBL和Pfam数据库交叉引用,还获得了Xpro数据库内容的非冗余集。该数据库目前包含493,983个基因的信息——351,918个含内含子的基因和142,065个无内含子的基因。Xpro会随着GenBank的每个新版本进行更新,可通过互联网在http://origin.bic.nus.edu.sg/xpro免费获取。