Chowdhary Rajesh, Bajic Vladimir B, Dong Difeng, Wong Limsoon, Liu Jun S
Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
BMC Syst Biol. 2010 May 28;4 Suppl 1(Suppl 1):S4. doi: 10.1186/1752-0509-4-S1-S4.
The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.
We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters.
This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that participate in regulation of histone and histone-coregulated gene transcription initiation. While these hypotheses still remain to be verified, we believe that these form a useful resource for researchers to further explore regulation of human histone genes and human genome. It is worthwhile to note that the regulatory regions of the human genome remain largely un-annotated even today and this study is an attempt to supplement our understanding of histone regulatory regions.
本研究的目的是:i)开发人类组蛋白编码基因(简称组蛋白基因)启动子的计算模型,组蛋白基因是参与各种关键细胞过程的一类重要基因;ii)使用所开发的模型来识别整个人类基因组中与组蛋白基因启动子具有相似结构的区域;此类区域可能代表潜在的基因组调控区域,例如可能与组蛋白基因共同调控的基因的启动子;iii)通过这种方式识别极有可能与组蛋白基因共同调控的基因。
我们使用一组全面的组蛋白基因成功开发了一个组蛋白启动子模型。基于留一法交叉验证测试,该模型具有良好的预测准确性(灵敏度为94.1%,特异性为92.6%,阳性预测值为92.8%)。我们使用此模型在全基因组范围内预测了许多与组蛋白基因启动子具有相似启动子结构的基因。因此我们推测这些预测的基因可能与组蛋白基因共同调控。这一推测与现有的基因表达、基因本体和通路数据非常吻合。与上述基因的启动子一起,我们发现了大量与组蛋白启动子结构相似的基因间区域。
本研究是迄今为止在全基因组范围内对人类组蛋白基因启动子进行的最全面的计算分析之一。我们的分析表明,许多其他人类基因与组蛋白基因在启动子结构上具有高度相似性,因此极有可能与组蛋白基因共同调控,进而共同表达。我们还发现全基因组中有大量基因间区域,其结构与组蛋白基因启动子相似。这些区域可能是尚未鉴定基因的启动子,或者可能代表参与组蛋白和组蛋白共同调控基因转录起始调控的远程控制区域。虽然这些推测仍有待验证,但我们相信这些为研究人员进一步探索人类组蛋白基因和人类基因组的调控形成了有用的资源。值得注意的是,即使在今天,人类基因组的调控区域在很大程度上仍未得到注释,本研究是补充我们对组蛋白调控区域理解的一次尝试。