Wu Min, Kwoh Chee-Keong, Li Xiaoli, Zheng Jie
BMC Syst Biol. 2014 Sep 11;8:107. doi: 10.1186/s12918-014-0107-1.
The regulatory mechanism of recombination is one of the most fundamental problems in genomics, with wide applications in genome wide association studies (GWAS), birth-defect diseases, molecular evolution, cancer research, etc. Recombination events cluster into short genomic regions called "recombination hotspots". Recently, a zinc finger protein PRDM9 was reported to regulate recombination hotspots in human and mouse genomes. In addition, a 13-mer motif contained in the binding sites of PRDM9 is found to be enriched in human hotspots. However, this 13-mer motif only covers a fraction of hotspots, indicating that PRDM9 is not the only regulator of recombination hotspots. Therefore, the challenge of discovering other regulators of recombination hotspots becomes significant. Furthermore, recombination is a complex process. Hence, multiple proteins acting as machinery, rather than individual proteins, are more likely to carry out this process in a precise and stable manner. Therefore, the extension of the prediction of individual trans-regulators to protein complexes is also highly desired.
In this paper, we introduce a pipeline to identify genes and protein complexes associated with recombination hotspots. First, we prioritize proteins associated with hotspots based on their preference of binding to hotspots and coldspots. Second, using the above identified genes as seeds, we apply the Random Walk with Restart algorithm (RWR) to propagate their influences to other proteins in protein-protein interaction (PPI) networks. Hence, many proteins without DNA-binding information will also be assigned a score to implicate their roles in recombination hotspots. Third, we construct sub-PPI networks induced by top genes ranked by RWR for various species (e.g., yeast, human and mouse) and detect protein complexes in those sub-PPI networks.
The GO term analysis show that our prioritizing methods and the RWR algorithm are capable of identifying novel genes associated with recombination hotspots. The trans-regulators predicted by our pipeline are enriched with epigenetic functions (e.g., histone modifications), demonstrating the epigenetic regulatory mechanisms of recombination hotspots. The identified protein complexes also provide us with candidates to further investigate the molecular machineries for recombination hotspots. Moreover, the experimental data and results are available on our web site http://www.ntu.edu.sg/home/zhengjie/data/RecombinationHotspot/NetPipe/.
重组的调控机制是基因组学中最基本的问题之一,在全基因组关联研究(GWAS)、出生缺陷疾病、分子进化、癌症研究等领域有广泛应用。重组事件聚集在称为“重组热点”的短基因组区域。最近,据报道锌指蛋白PRDM9可调控人类和小鼠基因组中的重组热点。此外,在PRDM9的结合位点中发现的一个13聚体基序在人类热点区域中富集。然而,这个13聚体基序仅覆盖了一部分热点,这表明PRDM9不是重组热点的唯一调控因子。因此,发现重组热点的其他调控因子的挑战变得十分重要。此外,重组是一个复杂的过程。因此,作为机制发挥作用的多种蛋白质,而非单个蛋白质,更有可能以精确和稳定的方式执行这一过程。因此,将单个反式调节因子的预测扩展到蛋白质复合物也非常必要。
在本文中,我们介绍了一种用于识别与重组热点相关的基因和蛋白质复合物的流程。首先,我们根据蛋白质与热点和冷点的结合偏好对与热点相关的蛋白质进行优先级排序。其次,以上述识别出的基因作为种子,我们应用带重启的随机游走算法(RWR)将它们的影响传播到蛋白质-蛋白质相互作用(PPI)网络中的其他蛋白质。因此,许多没有DNA结合信息的蛋白质也将被赋予一个分数以表明它们在重组热点中的作用。第三,我们为由RWR排名的顶级基因诱导的各种物种(如酵母、人类和小鼠)构建子PPI网络,并在这些子PPI网络中检测蛋白质复合物。
基因本体(GO)术语分析表明,我们的优先级排序方法和RWR算法能够识别与重组热点相关的新基因。我们的流程预测的反式调节因子富含表观遗传功能(如组蛋白修饰),这证明了重组热点的表观遗传调控机制。识别出的蛋白质复合物也为我们进一步研究重组热点的分子机制提供了候选对象。此外,实验数据和结果可在我们的网站http://www.ntu.edu.sg/home/zhengjie/data/RecombinationHotspot/NetPipe/上获取。