Institute for Biomedical Technologies, National Research Council, Segrate (Milan), Italy.
PLoS One. 2013 Sep 19;8(9):e75146. doi: 10.1371/journal.pone.0075146. eCollection 2013.
Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expression and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between colocalization and coregulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. Here, we describe NuChart, an R package that allows the user to annotate and statistically analyse a list of input genes with information relying on Hi-C data, integrating knowledge about genomic features that are involved in the chromosome spatial organization. NuChart works directly with sequenced reads to identify the related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. Predictions about CTCF binding sites, isochores and cryptic Recombination Signal Sequences are provided directly with the package for mapping, although other annotation data in bed format can be used (such as methylation profiles and histone patterns). Gene expression data can be automatically retrieved and processed from the Gene Expression Omnibus and ArrayExpress repositories to highlight the expression profile of genes in the identified neighbourhood. Moreover, statistical inferences about the graph structure and correlations between its topology and multi-omics features can be performed using Exponential-family Random Graph Models. The Hi-C fragment visualisation provided by NuChart allows the comparisons of cells in different conditions, thus providing the possibility of novel biomarkers identification. NuChart is compliant with the Bioconductor standard and it is freely available at ftp://fileserver.itb.cnr.it/nuchart.
长距离基因组区域间的关联及其在细胞核三维空间中的重定位,现在被认为是基因表达调控的关键因素,并且与涉及 DNA 重排的其他基因组特征之间的重要联系已经被强调。最近使用高通量测序(Hi-C)进行的染色体构象捕获(3C)测量和分子动力学研究表明,基因的共定位和共调控之间存在很大的相关性,但这些重要的研究受到缺乏生物学家友好的分析和可视化软件的阻碍。在这里,我们描述了 NuChart,这是一个 R 包,允许用户使用依赖于 Hi-C 数据的信息对输入基因列表进行注释和统计分析,整合涉及染色体空间组织的基因组特征的知识。NuChart 直接使用测序reads 来识别相关的 Hi-C 片段,旨在创建以基因为中心的邻域图,在该图上可以映射多组学特征。该包提供了关于 CTCF 结合位点、同异位和隐式重组信号序列的预测,可直接用于映射,尽管也可以使用其他以 bed 格式表示的注释数据(如甲基化谱和组蛋白模式)。基因表达数据可以自动从 Gene Expression Omnibus 和 ArrayExpress 存储库中检索和处理,以突出识别邻域中基因的表达谱。此外,使用指数家族随机图模型可以对图结构及其拓扑和多组学特征之间的相关性进行统计推断。NuChart 提供的 Hi-C 片段可视化允许比较不同条件下的细胞,从而提供了识别新生物标志物的可能性。NuChart 符合 Bioconductor 标准,可在 ftp://fileserver.itb.cnr.it/nuchart 上免费获得。