Suppr超能文献

Hyades 基序:从头开始发现配对序列中 DNA 基序对的期望最大化方法。

MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences.

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong.

出版信息

Bioinformatics. 2017 Oct 1;33(19):3028-3035. doi: 10.1093/bioinformatics/btx381.

Abstract

MOTIVATION

In higher eukaryotes, protein-DNA binding interactions are the central activities in gene regulation. In particular, DNA motifs such as transcription factor binding sites are the key components in gene transcription. Harnessing the recently available chromatin interaction data, computational methods are desired for identifying the coupling DNA motif pairs enriched on long-range chromatin-interacting sequence pairs (e.g. promoter-enhancer pairs) systematically.

RESULTS

To fill the void, a novel probabilistic model (namely, MotifHyades) is proposed and developed for de novo DNA motif pair discovery on paired sequences. In particular, two expectation maximization algorithms are derived for efficient model training with linear computational complexity. Under diverse scenarios, MotifHyades is demonstrated faster and more accurate than the existing ad hoc computational pipeline. In addition, MotifHyades is applied to discover thousands of DNA motif pairs with higher gold standard motif matching ratio, higher DNase accessibility and higher evolutionary conservation than the previous ones in the human K562 cell line. Lastly, it has been run on five other human cell lines (i.e. GM12878, HeLa-S3, HUVEC, IMR90, and NHEK), revealing another thousands of novel DNA motif pairs which are characterized across a broad spectrum of genomic features on long-range promoter-enhancer pairs.

AVAILABILITY AND IMPLEMENTATION

The matrix-algebra-optimized versions of MotifHyades and the discovered DNA motif pairs can be found in http://bioinfo.cs.cityu.edu.hk/MotifHyades.

CONTACT

kc.w@cityu.edu.hk.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在高等真核生物中,蛋白质与 DNA 的相互作用是基因调控的核心活动。特别是,转录因子结合位点等 DNA 基序是基因转录的关键组成部分。利用最近可用的染色质相互作用数据,需要计算方法系统地识别长程染色质相互作用序列对(例如启动子-增强子对)上富集的耦合 DNA 基序对。

结果

为了填补空白,提出并开发了一种新的概率模型(即 MotifHyades),用于从头发现成对序列上的 DNA 基序对。特别是,推导了两个期望最大化算法,用于具有线性计算复杂度的有效模型训练。在各种情况下,MotifHyades 被证明比现有的特定计算管道更快、更准确。此外,MotifHyades 被应用于在人类 K562 细胞系中发现数千对 DNA 基序对,其与现有基序的匹配率更高,DNase 可及性更高,进化保守性更高。最后,它已在另外五个人类细胞系(即 GM12878、HeLa-S3、HUVEC、IMR90 和 NHEK)上运行,揭示了数千个新的 DNA 基序对,这些基序对在长程启动子-增强子对上具有广泛的基因组特征。

可用性和实现

Matrix-algebra-optimized 版本的 MotifHyades 和发现的 DNA 基序对可在 http://bioinfo.cs.cityu.edu.hk/MotifHyades 上找到。

联系人

kc.w@cityu.edu.hk

补充信息

补充数据可在 Bioinformatics 在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验