Huntley Melanie A, Golding G Brian
Department of Biology, McMaster University, Hamilton, Ontario, Canada.
Proteins. 2002 Jul 1;48(1):134-40. doi: 10.1002/prot.10150.
A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.
在迄今为止已测序的蛋白质中,简单序列很丰富。但是,诸如简单序列这样的不寻常蛋白质特征在结构数据库中的出现频率并不相同。这些简单序列的一个子集,即具有高度重复性的一组序列,已被证明在真核生物中很丰富,而在原核生物中则不然。在这项研究中,对蛋白质数据库(PDB)中的真核生物蛋白质进行检查后发现,低复杂性、高度重复的蛋白质重复序列存在很大不足。通过从美国国立生物技术信息中心(NCBI)数据库获取的类似真核生物蛋白质样本的模拟数据库表明,PDB中高度重复的简单序列明显少于从NCBI随机导出的类似组成的人工数据库。当详细检查那些确实包含高度重复简单序列的少数PDB序列的结构数据时,发现在大多数情况下,由简单序列组成的区域的三级结构是未知的。PDB数据库和结构信息中都缺乏简单序列,这表明这种类型的简单序列可能会产生无序结构,从而使结构表征变得困难。