RNASA-IMEDIR, Computer Science Faculty, University of A Coruña , 15071 A Coruña, Spain.
Universidad Estatal Amazónica UEA , Puyo, Pastaza, Ecuador.
J Proteome Res. 2018 Mar 2;17(3):1258-1268. doi: 10.1021/acs.jproteome.7b00861. Epub 2018 Feb 5.
The spatial distribution of genes in chromosomes seems not to be random. For instance, only 10% of genes are transcribed from bidirectional promoters in humans, and many more are organized into larger clusters. This raises intriguing questions previously asked by different authors. We would like to add a few more questions in this context, related to gene orientation inversions. Does gene orientation (inversion) follow a random pattern? Is it relevant to biological activity somehow? We define a new kind of network coined as the gene orientation inversion network (GOIN). GOIN's complex network encodes short- and long-range patterns of inversion of the orientation of pairs of gene in the chromosome. We selected Plasmodium falciparum as a case of study due to the high relevance of this parasite to public health (causal agent of malaria). We constructed here for the first time all of the GOINs for the genome of this parasite. These networks have an average of 383 nodes (genes in one chromosome) and 1314 links (pairs of gene with inverse orientation). We calculated node centralities and other parameters of these networks. These numerical parameters were used to study different properties of gene inversion patterns, for example, distribution, local communities, similarity to Erdös-Rényi random networks, randomness, and so on. We find clues that seem to indicate that gene orientation inversion does not follow a random pattern. We noted that some gene communities in the GOINs tend to group genes encoding for RIFIN-related proteins in the proteome of the parasite. RIFIN-like proteins are a second family of clonally variant proteins expressed on the surface of red cells infected with Plasmodium falciparum. Consequently, we used these centralities as input of machine learning (ML) models to predict the RIFIN-like activity of 5365 proteins in the proteome of Plasmodium sp. The best linear ML model found discriminates RIFIN-like from other proteins with sensitivity and specificity 70-80% in training and external validation series. All of these results may point to a possible biological relevance of gene orientation inversion not directly dependent on genetic sequence information. This work opens the gate to the use of GOINs as a tool for the study of the structure of chromosomes and the study of protein function in proteome research.
染色体中基因的空间分布似乎不是随机的。例如,人类只有 10%的基因是由双向启动子转录的,而更多的基因则被组织成更大的簇。这就提出了之前不同作者提出的有趣问题。在这方面,我们想再提出一些与基因取向反转相关的问题。基因取向(反转)是否遵循随机模式?它是否以某种方式与生物活性有关?我们定义了一种新的网络,称为基因取向反转网络(GOIN)。GOIN 的复杂网络编码了染色体中基因对取向反转的短程和长程模式。我们选择恶性疟原虫作为研究案例,因为这种寄生虫与公共卫生(疟疾的病原体)高度相关。我们首次为这种寄生虫的基因组构建了所有的 GOIN。这些网络平均有 383 个节点(一个染色体上的基因)和 1314 个链接(具有反转方向的基因对)。我们计算了这些网络的节点中心度和其他参数。这些数值参数被用于研究基因反转模式的不同性质,例如分布、局部社区、与 Erdös-Rényi 随机网络的相似性、随机性等。我们发现了一些线索,似乎表明基因取向反转并不遵循随机模式。我们注意到,GOIN 中的一些基因社区倾向于将寄生虫蛋白质组中编码 RIFIN 相关蛋白的基因分组。RIFIN 样蛋白是恶性疟原虫感染的红细胞表面表达的第二类克隆变异蛋白。因此,我们将这些中心度作为机器学习(ML)模型的输入,以预测寄生虫蛋白质组中 5365 种蛋白质的 RIFIN 样活性。发现的最佳线性 ML 模型能够以 70-80%的训练和外部验证系列的灵敏度和特异性区分 RIFIN 样蛋白和其他蛋白。所有这些结果都可能指向基因取向反转的一种可能的生物学相关性,而这种相关性不直接依赖于遗传序列信息。这项工作为使用 GOIN 作为研究染色体结构和蛋白质组研究中蛋白质功能的工具开辟了道路。