Suppr超能文献

利用全基因组接触图谱填补酵母基因组中的注释空白。

Filling annotation gaps in yeast genomes using genome-wide contact maps.

机构信息

Institut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Agein

Institut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, FranceInstitut Pasteur, Groupe Régulation Spatiale des Génomes, Department of Genomes and Genetics, CNRS, UMR 3525, Institut Pasteur, Unité Imagerie et Modélisation, Department of Cell Biology and Infection, CNRS, URA 2582, F-75015 Paris, France, Institute for Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284 - INSERM U108, Université de Nice Sophia Antipolis, 06107 Nice, France, CNRS, UMR7238, Biologie Computationnelle et Quantitative and Sorbonne Universités, UPMC Univ Paris 06, UMR7238, Biologie Computationnelle et Quantitative, F-75005, Paris, France.

出版信息

Bioinformatics. 2014 Aug 1;30(15):2105-13. doi: 10.1093/bioinformatics/btu162. Epub 2014 Apr 7.

Abstract

MOTIVATIONS

De novo sequencing of genomes is followed by annotation analyses aiming at identifying functional genomic features such as genes, non-coding RNAs or regulatory sequences, taking advantage of diverse datasets. These steps sometimes fail at detecting non-coding functional sequences: for example, origins of replication, centromeres and rDNA positions have proven difficult to annotate with high confidence. Here, we demonstrate an unconventional application of Chromosome Conformation Capture (3C) technique, which typically aims at deciphering the average 3D organization of genomes, by showing how functional information about the sequence can be extracted solely from the chromosome contact map.

RESULTS

Specifically, we describe a combined experimental and bioinformatic procedure that determines the genomic positions of centromeres and ribosomal DNA clusters in yeasts, including species where classical computational approaches fail. For instance, we determined the centromere positions in Naumovozyma castellii, where these coordinates could not be obtained previously. Although computed centromere positions were characterized by conserved synteny with neighboring species, no consensus sequences could be found, suggesting that centromeric binding proteins or mechanisms have significantly diverged. We also used our approach to refine centromere positions in Kuraishia capsulata and to identify rDNA positions in Debaryomyces hansenii. Our study demonstrates how 3C data can be used to complete the functional annotation of eukaryotic genomes.

AVAILABILITY AND IMPLEMENTATION

The source code is provided in the Supplementary Material. This includes a zipped file with the Python code and a contact matrix of Saccharomyces cerevisiae.

CONTACT

romain.koszul@pasteur.fr

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

对基因组进行从头测序后,通常会进行注释分析,以利用各种数据集来识别基因、非编码 RNA 或调控序列等功能基因组特征。这些步骤有时无法检测到非编码功能序列:例如,复制起点、着丝粒和 rDNA 位置的注释置信度很难提高。在这里,我们展示了染色体构象捕获(3C)技术的一种非常规应用,该技术通常旨在解析基因组的平均 3D 结构,我们展示了如何仅从染色体接触图谱中提取有关序列的功能信息。

结果

具体来说,我们描述了一种组合的实验和生物信息学程序,该程序确定了酵母中的着丝粒和核糖体 DNA 簇的基因组位置,包括经典计算方法无法确定的物种。例如,我们确定了 Naumovozyma castellii 中的着丝粒位置,在此之前无法获得这些坐标。尽管计算出的着丝粒位置与相邻物种具有保守的同线性,但没有找到共识序列,这表明着丝粒结合蛋白或机制已经显著分化。我们还使用我们的方法来改进 Kuraishia capsulata 中的着丝粒位置,并确定 Debaryomyces hansenii 中的 rDNA 位置。我们的研究表明 3C 数据如何用于完成真核基因组的功能注释。

可用性和实施

源代码在补充材料中提供。这包括一个包含 Python 代码和酿酒酵母接触矩阵的压缩文件。

联系方式

romain.koszul@pasteur.fr

补充信息

补充数据可在 Bioinformatics 在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验