Facultad de Ciencias, UNAM, Ciudad Universitaria, Apdo. Postal 70-407, México D. F. 04510, Mexico; Laboratorios de Biológicos y Reactivos de México, Amores 1240, Colonia Del Valle, México D. F. 03100, Mexico.
J Theor Biol. 2013 Dec 7;338:80-6. doi: 10.1016/j.jtbi.2013.08.039. Epub 2013 Sep 8.
Low complexity regions (LCRs) are sequences of nucleic acids or proteins defined by a compositional bias. Their occurrence has been confirmed in sequences of the three cellular lineages (Bacteria, Archaea and Eucarya), and has also been reported in viral genomes. We present here the results of a detailed computer analysis of the LCRs present in the HIV-1 glycoprotein 120 (gp120) encoded by the viral gene env. The analysis was performed using a sample of 3637 Env polyprotein sequences derived from 4117 completely sequenced and translated HIV-1 genomes available in public databases as of December 2012. We have identified 1229 LCRs located in four different regions of the gp120 protein that correspond to four of the five regions that have been identified as hypervariable (V1, V2, V4 and V5). The remaining 29 LCRs are found in the signal peptide and in the conserved regions C2, C3, C4 and C5. No LCR has been identified in the hypervariable region V3. The LCRs detected in the V1, V2, V4, and V5 hypervariable regions exhibit a high Asn content in their amino acid composition, which very likely correspond to glycosylation sites, which may contribute to the retroviral ability to avoid the immune system. In sharp contrast with what is observed in gp120 proteins lacking LCRs, the glycosylation sites present in LCRs tend to be clustered towards the center of the region forming well-defined islands. The results presented here suggest that LCRs represent a hitherto undescribed source of genomic variability in lentivirus, and that these repeats may represent an important source of antigenic variation in HIV-1 populations. The results reported here may exemplify the evolutionary processes that may have increased the size of primitive cellular RNA genomes and the role of LCRs as a source of raw material during the processes of evolutionary acquisition of new functions.
低复杂度区域(LCRs)是由组成偏向定义的核酸或蛋白质序列。它们的存在已在三个细胞谱系(细菌,古细菌和真核生物)的序列中得到证实,并且在病毒基因组中也有报道。我们在这里介绍了对病毒基因 env 编码的 HIV-1 糖蛋白 120(gp120)中存在的 LCR 进行详细计算机分析的结果。该分析使用了截至 2012 年 12 月从公共数据库中可用的 4117 个完全测序和翻译的 HIV-1 基因组中获得的 3637 个 Env 多蛋白序列的样本。我们已经确定了位于 gp120 蛋白四个不同区域的 1229 个 LCR,它们对应于已确定为高度可变的五个区域中的四个(V1、V2、V4 和 V5)。其余 29 个 LCR 位于信号肽和保守区域 C2、C3、C4 和 C5 中。在高度可变的 V3 区域中没有发现 LCR。在 V1、V2、V4 和 V5 高度可变区域中检测到的 LCR 在其氨基酸组成中表现出高天冬酰胺含量,这很可能对应于糖基化位点,这可能有助于逆转录病毒逃避免疫系统的能力。与在缺乏 LCR 的 gp120 蛋白中观察到的情况形成鲜明对比的是,在 LCR 中存在的糖基化位点倾向于聚集在区域的中心,形成明确的岛屿。这里提出的结果表明,LCRs 代表了慢病毒中尚未描述的基因组变异性来源,并且这些重复序列可能代表了 HIV-1 群体中抗原变异的重要来源。这里报告的结果可以举例说明可能增加原始细胞 RNA 基因组大小的进化过程,以及 LCRs 在进化获得新功能过程中作为原始材料来源的作用。