深入黑暗核心：人类非编码DNA的大规模聚类

Into the heart of darkness: large-scale clustering of human non-coding DNA.

作者信息

Bejerano Gill, Haussler David, Blanchette Mathieu

机构信息

Center for Biomolecular Science and Engineering, Baskin School of Engineering University of California in Santa Cruz, Santa Cruz, CA 95064, USA.

出版信息

Bioinformatics. 2004 Aug 4;20 Suppl 1:i40-8. doi: 10.1093/bioinformatics/bth946.

DOI:10.1093/bioinformatics/bth946

PMID:15262779

Abstract

MOTIVATION

It is currently believed that the human genome contains about twice as much non-coding functional regions as it does protein-coding genes, yet our understanding of these regions is very limited.

RESULTS

We examine the intersection between syntenically conserved sequences in the human, mouse and rat genomes, and sequence similarities within the human genome itself, in search of families of non-protein-coding elements. For this purpose we develop a graph theoretic clustering algorithm, akin to the highly successful methods used in elucidating protein sequence family relationships. The algorithm is applied to a highly filtered set of about 700 000 human-rodent evolutionarily conserved regions, not resembling any known coding sequence, which encompasses 3.7% of the human genome. From these, we obtain roughly 12 000 non-singleton clusters, dense in significant sequence similarities. Further analysis of genomic location, evidence of transcription and RNA secondary structure reveals many clusters to be significantly homogeneous in one or more characteristics. This subset of the highly conserved non-protein-coding elements in the human genome thus contains rich family-like structures, which merit in-depth analysis.

AVAILABILITY

Supplementary material to this work is available at http://www.soe.ucsc.edu/~jill/dark.html

摘要

动机

目前人们认为，人类基因组中包含的非编码功能区域数量大约是蛋白质编码基因数量的两倍，但我们对这些区域的了解非常有限。

结果

我们研究了人类、小鼠和大鼠基因组中同线保守序列之间的交集，以及人类基因组本身内部的序列相似性，以寻找非蛋白质编码元件家族。为此，我们开发了一种图论聚类算法，类似于用于阐明蛋白质序列家族关系的非常成功的方法。该算法应用于一组经过高度筛选的约70万个不类似于任何已知编码序列的人类-啮齿动物进化保守区域，这些区域占人类基因组的3.7%。从中，我们获得了大约12000个非单例聚类，这些聚类在显著的序列相似性方面很密集。对基因组位置、转录证据和RNA二级结构的进一步分析表明，许多聚类在一个或多个特征上具有显著的同质性。因此，人类基因组中高度保守的非蛋白质编码元件的这一子集包含丰富的家族样结构，值得深入分析。

可用性

这项工作的补充材料可在http://www.soe.ucsc.edu/~jill/dark.html获取

相似文献

Into the heart of darkness: large-scale clustering of human non-coding DNA.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i40-8. doi: 10.1093/bioinformatics/bth946.

Non-coding RNAs in Ciona intestinalis.

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii77-8. doi: 10.1093/bioinformatics/bti1113.

Divergence of conserved non-coding sequences: rate estimates and relative rate tests.

Mol Biol Evol. 2004 Nov;21(11):2116-21. doi: 10.1093/molbev/msh221. Epub 2004 Jul 28.

Sequencing and genomic annotation of the chicken (Gallus gallus) Hox clusters, and mapping of evolutionarily conserved regions.

Cytogenet Genome Res. 2007;117(1-4):110-9. doi: 10.1159/000103171.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

ESTviewer: a web interface for visualizing mouse, rat, cattle, pig and chicken conserved ESTs in human genes and human alternatively spliced variants.

Bioinformatics. 2005 May 15;21(10):2510-3. doi: 10.1093/bioinformatics/bti332. Epub 2005 Feb 18.

Small fitness effect of mutations in highly conserved non-coding regions.

Hum Mol Genet. 2005 Aug 1;14(15):2221-9. doi: 10.1093/hmg/ddi226. Epub 2005 Jun 30.

NcDNAlign: plausible multiple alignments of non-protein-coding genomic sequences.

Genomics. 2008 Jul;92(1):65-74. doi: 10.1016/j.ygeno.2008.04.003. Epub 2008 Jun 3.

Comparative genomics reveals unusually long motifs in mammalian genomes.

Bioinformatics. 2006 Jul 15;22(14):e236-42. doi: 10.1093/bioinformatics/btl265.

HomologMiner: looking for homologous genomic groups in whole genomes.

Bioinformatics. 2007 Apr 15;23(8):917-25. doi: 10.1093/bioinformatics/btm048. Epub 2007 Feb 18.

引用本文的文献

Motif distribution in genomes gives insights into gene clustering and co-regulation.

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1178.

Global properties of regulatory sequences are predicted by transcription factor recognition mechanisms.

Genome Biol. 2021 Oct 7;22(1):285. doi: 10.1186/s13059-021-02503-y.

Social Networking of Quasi-Species Consortia drive Virolution via Persistence.

AIMS Microbiol. 2021 Apr 30;7(2):138-162. doi: 10.3934/microbiol.2021010. eCollection 2021.

A Method for the Structure-Based, Genome-Wide Analysis of Bacterial Intergenic Sequences Identifies Shared Compositional and Functional Features.

Genes (Basel). 2019 Oct 22;10(10):834. doi: 10.3390/genes10100834.

Ice ages and butterflyfishes: Phylogenomics elucidates the ecological and evolutionary history of reef fishes in an endemism hotspot.

Ecol Evol. 2018 Oct 23;8(22):10989-11008. doi: 10.1002/ece3.4566. eCollection 2018 Nov.

Short linear motifs - ex nihilo evolution of protein regulation.

Cell Commun Signal. 2015 Nov 21;13:43. doi: 10.1186/s12964-015-0120-z.

Expression of transcribed ultraconserved regions of genome in rat cerebral cortex.

Neurochem Int. 2014 Nov;77:86-93. doi: 10.1016/j.neuint.2014.06.006. Epub 2014 Jun 20.

Genome-wide analysis of promoters: clustering by alignment and analysis of regular patterns.

PLoS One. 2014 Jan 22;9(1):e85260. doi: 10.1371/journal.pone.0085260. eCollection 2014.

Study of Modern Human Evolution via Comparative Analysis with the Neanderthal Genome.

Genomics Inform. 2013 Dec;11(4):230-8. doi: 10.5808/GI.2013.11.4.230. Epub 2013 Dec 31.

HINCUTs in cancer: hypoxia-induced noncoding ultraconserved transcripts.

Cell Death Differ. 2013 Dec;20(12):1675-87. doi: 10.1038/cdd.2013.119. Epub 2013 Sep 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深入黑暗核心：人类非编码DNA的大规模聚类

Into the heart of darkness: large-scale clustering of human non-coding DNA.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献