Yeats Corin, Lees Jonathan, Reid Adam, Kellam Paul, Martin Nigel, Liu Xinhui, Orengo Christine
UCL, Department of Molecular Biology & Biochemistry, Darwin Building, Gower St, London, UK.
Nucleic Acids Res. 2008 Jan;36(Database issue):D414-8. doi: 10.1093/nar/gkm1019. Epub 2007 Nov 21.
Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/
Gene3D为大多数可用蛋白质序列提供了全面的结构和功能注释,包括UniProt、RefSeq和Integr8资源。主要的结构注释是通过将这些序列与CATH结构域数据库的profile-HMM库进行比对生成的。CATH是一个基于手动推导的、以PDB为基础的结构域数据库,其结构域按照反映拓扑结构、同源性和保守性的层次结构排列,与基于序列的方法相比,它能够推断出更古老、分歧更大的同源关系。这些数据还补充了Pfam-A、其他非结构域结构预测(如卷曲螺旋)以及来自UniProt的实验数据。为了加强利用这些数据进行的研究,我们还整合了各种蛋白质注释资源,包括蛋白质-蛋白质相互作用数据、GO功能注释、KEGG通路、FUNCAT功能描述以及与微阵列表达数据的链接。所有这些数据都可以通过一个新重新设计的网站访问,该网站注重灵活性和清晰度,搜索可以限制在单个基因组或整个序列数据库。目前,Gene3D包含了近500万种蛋白质的350多万个结构域注释,其中包括527个完整基因组。可通过以下网址访问:http://gene3d.biochem.ucl.ac.uk/