Suppr超能文献

保守非编码基因组序列中调控信号的富集

Enrichment of regulatory signals in conserved non-coding genomic sequence.

作者信息

Levy S, Hannenhalli S, Workman C

机构信息

Informatics Research, Celera Genomics Corporation, 45 West Gude Drive, Rockville, MD 20850, USA.

出版信息

Bioinformatics. 2001 Oct;17(10):871-7. doi: 10.1093/bioinformatics/17.10.871.

Abstract

MOTIVATION

Whole genome shotgun sequencing strategies generate sequence data prior to the application of assembly methodologies that result in contiguous sequence. Sequence reads can be employed to indicate regions of conservation between closely related species for which only one genome has been assembled. Consequently, by using pairwise sequence alignments methods it is possible to identify novel, non-repetitive, conserved segments in non-coding sequence that exist between the assembled human genome and mouse whole genome shotgun sequencing fragments. Conserved non-coding regions identify potentially functional DNA that could be involved in transcriptional regulation.

RESULTS

Local sequence alignment methods were applied employing mouse fragments and the assembled human genome. In addition, transcription factor binding sites were detected by aligning their corresponding positional weight matrices to the sequence regions. These methods were applied to a set of transcripts corresponding to 502 genes associated with a variety of different human diseases taken from the Online Mendelian Inheritance in Man database. Using statistical arguments we have shown that conserved non-coding segments contain an enrichment of transcription factor binding sites when compared to the sequence background in which the conserved segments are located. This enrichment of binding sites was not observed in coding sequence. Conserved non-coding segments are not extensively repeated in the genome and therefore their identification provides a rapid means of finding genes with related conserved regions, and consequently potentially related regulatory mechanism. Conserved segments in upstream regions are found to contain binding sites that are co-localized in a manner consistent with experimentally known transcription factor pairwise co-occurrences and afford the identification of novel co-occurring Transcription Factor (TF) pairs. This study provides a methodology and more evidence to suggest that conserved non-coding regions are biologically significant since they contain a statistical enrichment of regulatory signals and pairs of signals that enable the construction of regulatory models for human genes.

CONTACT

samuel.levy@celera.com.

摘要

动机

全基因组鸟枪法测序策略在应用导致连续序列的组装方法之前生成序列数据。序列读数可用于指示仅组装了一个基因组的密切相关物种之间的保守区域。因此,通过使用成对序列比对方法,可以在已组装的人类基因组与小鼠全基因组鸟枪法测序片段之间的非编码序列中识别新的、非重复的保守片段。保守的非编码区域可识别可能参与转录调控的潜在功能性DNA。

结果

应用局部序列比对方法,将小鼠片段与已组装的人类基因组进行比对。此外,通过将转录因子结合位点的相应位置权重矩阵与序列区域进行比对来检测转录因子结合位点。这些方法应用于一组对应于502个与多种不同人类疾病相关的基因的转录本,这些基因取自《人类孟德尔遗传在线》数据库。通过统计学论证,我们表明,与保守片段所在的序列背景相比,保守的非编码片段富含转录因子结合位点。在编码序列中未观察到这种结合位点的富集。保守的非编码片段在基因组中并非广泛重复,因此它们的识别提供了一种快速找到具有相关保守区域的基因的方法,进而可能找到相关的调控机制。发现上游区域的保守片段包含以与实验已知的转录因子成对共现一致的方式共定位的结合位点,并有助于识别新的共现转录因子(TF)对。本研究提供了一种方法,并提供了更多证据表明保守的非编码区域具有生物学意义,因为它们包含调控信号的统计学富集以及能够构建人类基因调控模型的信号对。

联系方式

samuel.levy@celera.com

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验