Depledge Daniel P, Dalby Andrew R
School of Biological and Chemical Sciences and Engineering, Washington Singer Laboratories, University of Exeter, Prince of Wales Road, Exeter, EX4 4PS, UK.
BMC Bioinformatics. 2005 Aug 3;6:196. doi: 10.1186/1471-2105-6-196.
Single amino acid repeats make up a significant proportion in all of the proteomes that have currently been determined. They have been shown to be functionally and medically significant, and are associated with cancers and neuro-degenerative diseases such as Huntington's Chorea, where a poly-glutamine repeat is responsible for causing the disease. The COPASAAR database is a new tool to facilitate the rapid analysis of single amino acid repeats at a proteome level. The database aims to simplify the comparison of repeat distributions between proteomes in order to provide a better understanding of their function and evolution.
A comparative analysis of all proteomes in the database (currently 244) shows that single amino acid repeats account for about 12-14% of the proteome of any given species. They are more common in eukaryotes (14%) than in either archaea or bacteria (both 13%). Individual analyses of proteomes show that long single amino acid repeats (6+ residues) are much more common in the Eukaryotes and that longer repeats are usually made up of hydrophilic amino acids such as glutamine, glutamic acid, asparagine, aspartic acid and serine.
COPASAAR is a useful tool for comparative proteomics that provides rapid access to amino acid repeat data that can be readily data-mined. The COPASAAR database can be queried at the kingdom, proteome or individual protein level. As the amount of available proteome data increases this will be increasingly important in order to automate proteome comparison. The insights gained from these studies will give a better insight into the evolution of protein sequence and function.
单氨基酸重复序列在目前已确定的所有蛋白质组中占相当大的比例。它们已被证明在功能和医学上具有重要意义,并与癌症和神经退行性疾病相关,如亨廷顿舞蹈症,其中多聚谷氨酰胺重复序列是导致该疾病的原因。COPASAAR数据库是一种新工具,有助于在蛋白质组水平上快速分析单氨基酸重复序列。该数据库旨在简化蛋白质组之间重复序列分布的比较,以便更好地理解其功能和进化。
对数据库中所有蛋白质组(目前为244个)的比较分析表明,单氨基酸重复序列约占任何给定物种蛋白质组的12% - 14%。它们在真核生物(14%)中比古细菌或细菌(均为13%)中更常见。对蛋白质组的个体分析表明,长单氨基酸重复序列(6个以上残基)在真核生物中更为常见,并且较长的重复序列通常由亲水性氨基酸组成,如谷氨酰胺、谷氨酸、天冬酰胺、天冬氨酸和丝氨酸。
COPASAAR是比较蛋白质组学的一个有用工具,可快速获取可方便地进行数据挖掘的氨基酸重复序列数据。COPASAAR数据库可以在界、蛋白质组或单个蛋白质水平上进行查询。随着可用蛋白质组数据量的增加,为了实现蛋白质组比较的自动化,这将变得越来越重要。从这些研究中获得的见解将有助于更好地理解蛋白质序列和功能的进化。