SArKS：通过后缀数组核平滑进行基因表达调控基序位点和结构域的从头发现。

SArKS: de novo discovery of gene expression regulatory motif sites and domains by suffix array kernel smoothing.

机构信息

Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, TX, USA.

Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.

出版信息

Bioinformatics. 2019 Oct 15;35(20):3944-3952. doi: 10.1093/bioinformatics/btz198.

DOI:10.1093/bioinformatics/btz198

PMID:30903136

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7963082/

Abstract

MOTIVATION

We set out to develop an algorithm that can mine differential gene expression data to identify candidate cell type-specific DNA regulatory sequences. Differential expression is usually quantified as a continuous score-fold-change, test-statistic, P-value-comparing biological classes. Unlike existing approaches, our de novo strategy, termed SArKS, applies non-parametric kernel smoothing to uncover promoter motif sites that correlate with elevated differential expression scores. SArKS detects motif k-mers by smoothing sequence scores over sequence similarity. A second round of smoothing over spatial proximity reveals multi-motif domains (MMDs). Discovered motif sites can then be merged or extended based on adjacency within MMDs. False positive rates are estimated and controlled by permutation testing.

RESULTS

We applied SArKS to published gene expression data representing distinct neocortical neuron classes in Mus musculus and interneuron developmental states in Homo sapiens. When benchmarked against several existing algorithms using a cross-validation procedure, SArKS identified larger motif sets that formed the basis for regression models with higher correlative power.

AVAILABILITY AND IMPLEMENTATION

https://github.com/denniscwylie/sarks.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

我们旨在开发一种算法，能够挖掘差异基因表达数据，以识别候选细胞类型特异性 DNA 调控序列。差异表达通常被量化为连续的分数变化、检验统计量、比较生物类别的 P 值。与现有方法不同，我们的从头开始策略，称为 SArKS，应用非参数核平滑来揭示与升高的差异表达分数相关的启动子 motif 位点。SArKS 通过在序列相似性上平滑序列分数来检测 motif k-mers。第二轮在空间接近度上的平滑揭示了多 motif 域（MMD）。然后可以根据 MMD 内的邻接关系合并或扩展发现的 motif 位点。通过置换检验估计和控制假阳性率。

结果

我们将 SArKS 应用于已发表的基因表达数据，这些数据代表了 Mus musculus 中不同的新皮层神经元类和 Homo sapiens 中的中间神经元发育状态。当使用交叉验证程序与几种现有算法进行基准测试时，SArKS 确定了更大的 motif 集，这些 motif 集构成了具有更高相关能力的回归模型的基础。

可用性和实现

https://github.com/denniscwylie/sarks。

补充信息

补充数据可在 Bioinformatics 在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

SArKS：通过后缀数组核平滑进行基因表达调控基序位点和结构域的从头发现。

SArKS: de novo discovery of gene expression regulatory motif sites and domains by suffix array kernel smoothing.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

SArKS：通过后缀数组核平滑进行基因表达调控基序位点和结构域的从头发现。

SArKS: de novo discovery of gene expression regulatory motif sites and domains by suffix array kernel smoothing.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献