Bannen Ryan M, Bingman Craig A, Phillips George N
Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53711, USA.
J Struct Funct Genomics. 2007 Dec;8(4):217-26. doi: 10.1007/s10969-008-9039-6. Epub 2008 Feb 27.
It has been previously shown that protein sequences containing a quasi-repetitive assortment of amino acids are common in genomes and databases such as Swiss-Prot but are under-represented in the structure-based Protein Data Bank (PDB). Structural genomics groups have been using the absence of these "low-complexity" sequences for several years as a way to select proteins that have a good chance of successful structure determination. In this study, we examine the data deposited in the PDB as well as the available data from structural genomics groups in TargetDB and PepcDB to reveal interesting trends that could be taken into consideration when using low-complexity sequences as part of the target selection process.
先前的研究表明,含有准重复氨基酸排列的蛋白质序列在基因组和诸如瑞士蛋白质数据库(Swiss-Prot)等数据库中很常见,但在基于结构的蛋白质数据库(PDB)中代表性不足。结构基因组学团队多年来一直利用这些“低复杂性”序列的缺失来选择有很大机会成功确定结构的蛋白质。在本研究中,我们检查了PDB中存入的数据以及TargetDB和PepcDB中结构基因组学团队的现有数据,以揭示在将低复杂性序列用作目标选择过程的一部分时可考虑的有趣趋势。