Belmabrouk Sabrine, Kharrat Najla, Abdelhedi Rania, Ben Ayed Amine, Benmarzoug Riadh, Rebai Ahmed
Centre de Biotechnologie de Sfax, Laboratoire de Procédés de Criblage Moléculaire et Cellulaire, PoBox '1177', 3018, Sfax, Tunisia.
BMC Genomics. 2017 Aug 8;18(1):588. doi: 10.1186/s12864-017-4000-3.
Studying genetic variation distribution in proteins containing charged regions, called charge clusters (CCs), is of great interest to unravel their functional role. Charge clusters are 20 to 75 residue segments with high net positive charge, high net negative charge, or high total charge relative to the overall charge composition of the protein. We previously developed a bioinformatics tool (FCCP) to detect charge clusters in proteomes and scanned the human proteome for the occurrence of CCs. In this paper we investigate the genetic variations in the human proteins harbouring CCs.
We studied the coding regions of 317 positively charged clusters and 1020 negatively charged ones previously detected in human proteins. Results revealed that coding parts of CCs are richer in sequence variants than their corresponding genes, full mRNAs, and exonic + intronic sequences and that these variants are predominately rare (Minor allele frequency < 0.005). Furthermore, variants occurring in the coding parts of positively charged regions of proteins are more often pathogenic than those occurring in negatively charged ones. Classification of variants according to their types showed that substitution is the major type followed by Indels (Insertions-deletions). Concerning substitutions, it was found that within clusters of both charges, the charged amino acids were the greatest loser groups whereas polar residues were the greatest gainers.
Our findings highlight the prominent features of the human charged regions from the DNA up to the protein sequence which might provide potential clues to improve the current understanding of those charged regions and their implication in the emergence of diseases.
研究含有带电区域(称为电荷簇,CCs)的蛋白质中的遗传变异分布,对于揭示其功能作用具有重要意义。电荷簇是由20至75个残基组成的片段,相对于蛋白质的整体电荷组成具有高净正电荷、高净负电荷或高总电荷。我们之前开发了一种生物信息学工具(FCCP)来检测蛋白质组中的电荷簇,并扫描了人类蛋白质组中CCs的出现情况。在本文中,我们研究了含有CCs的人类蛋白质中的遗传变异。
我们研究了先前在人类蛋白质中检测到的317个带正电荷簇和1020个带负电荷簇的编码区域。结果显示,CCs的编码部分比其相应的基因、完整mRNA以及外显子+内含子序列具有更丰富的序列变异,并且这些变异主要是罕见的(次要等位基因频率<0.005)。此外,蛋白质带正电荷区域的编码部分出现的变异比带负电荷区域的变异更常具有致病性。根据变异类型进行分类显示,替换是主要类型,其次是插入缺失(Indels)。关于替换,发现在两种电荷的簇中,带电荷的氨基酸是损失最大的组,而极性残基是增加最大的组。
我们的研究结果突出了从DNA到蛋白质序列的人类带电区域的显著特征,这可能为改善目前对这些带电区域及其在疾病发生中的作用的理解提供潜在线索。