Institut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Agein
Institut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, France.
Bioinformatics. 2014 Aug 1;30(15):2105-13. doi: 10.1093/bioinformatics/btu162. Epub 2014 Apr 7.
De novo sequencing of genomes is followed by annotation analyses aiming at identifying functional genomic features such as genes, non-coding RNAs or regulatory sequences, taking advantage of diverse datasets. These steps sometimes fail at detecting non-coding functional sequences: for example, origins of replication, centromeres and rDNA positions have proven difficult to annotate with high confidence. Here, we demonstrate an unconventional application of Chromosome Conformation Capture (3C) technique, which typically aims at deciphering the average 3D organization of genomes, by showing how functional information about the sequence can be extracted solely from the chromosome contact map.
Specifically, we describe a combined experimental and bioinformatic procedure that determines the genomic positions of centromeres and ribosomal DNA clusters in yeasts, including species where classical computational approaches fail. For instance, we determined the centromere positions in Naumovozyma castellii, where these coordinates could not be obtained previously. Although computed centromere positions were characterized by conserved synteny with neighboring species, no consensus sequences could be found, suggesting that centromeric binding proteins or mechanisms have significantly diverged. We also used our approach to refine centromere positions in Kuraishia capsulata and to identify rDNA positions in Debaryomyces hansenii. Our study demonstrates how 3C data can be used to complete the functional annotation of eukaryotic genomes.
The source code is provided in the Supplementary Material. This includes a zipped file with the Python code and a contact matrix of Saccharomyces cerevisiae.
Supplementary data are available at Bioinformatics online.
对基因组进行从头测序后,通常会进行注释分析,以利用各种数据集来识别基因、非编码 RNA 或调控序列等功能基因组特征。这些步骤有时无法检测到非编码功能序列:例如,复制起点、着丝粒和 rDNA 位置的注释置信度很难提高。在这里,我们展示了染色体构象捕获(3C)技术的一种非常规应用,该技术通常旨在解析基因组的平均 3D 结构,我们展示了如何仅从染色体接触图谱中提取有关序列的功能信息。
具体来说,我们描述了一种组合的实验和生物信息学程序,该程序确定了酵母中的着丝粒和核糖体 DNA 簇的基因组位置,包括经典计算方法无法确定的物种。例如,我们确定了 Naumovozyma castellii 中的着丝粒位置,在此之前无法获得这些坐标。尽管计算出的着丝粒位置与相邻物种具有保守的同线性,但没有找到共识序列,这表明着丝粒结合蛋白或机制已经显著分化。我们还使用我们的方法来改进 Kuraishia capsulata 中的着丝粒位置,并确定 Debaryomyces hansenii 中的 rDNA 位置。我们的研究表明 3C 数据如何用于完成真核基因组的功能注释。
源代码在补充材料中提供。这包括一个包含 Python 代码和酿酒酵母接触矩阵的压缩文件。
补充数据可在 Bioinformatics 在线获取。