折叠k谱核：一种利用有间隙核苷酸依赖性检测转录因子结合位点的机器学习方法。

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies.

作者信息

Elmas Abdulkadir, Wang Xiaodong, Dresch Jacqueline M

机构信息

Department of Electrical Engineering, Columbia University, New York, NY, United States of America.

Department of Mathematics and Computer Science, Clark University, Worcester, MA, United States of America.

出版信息

PLoS One. 2017 Oct 5;12(10):e0185570. doi: 10.1371/journal.pone.0185570. eCollection 2017.

DOI:10.1371/journal.pone.0185570

PMID:28982128

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5628859/

Abstract

Understanding the molecular machinery involved in transcriptional regulation is central to improving our knowledge of an organism's development, disease, and evolution. The building blocks of this complex molecular machinery are an organism's genomic DNA sequence and transcription factor proteins. Despite the vast amount of sequence data now available for many model organisms, predicting where transcription factors bind, often referred to as 'motif detection' is still incredibly challenging. In this study, we develop a novel bioinformatic approach to binding site prediction. We do this by extending pre-existing SVM approaches in an unbiased way to include all possible gapped k-mers, representing different combinations of complex nucleotide dependencies within binding sites. We show the advantages of this new approach when compared to existing SVM approaches, through a rigorous set of cross-validation experiments. We also demonstrate the effectiveness of our new approach by reporting on its improved performance on a set of 127 genomic regions known to regulate gene expression along the anterio-posterior axis in early Drosophila embryos.

摘要

了解转录调控所涉及的分子机制对于增进我们对生物体发育、疾病和进化的认识至关重要。这个复杂分子机制的组成部分是生物体的基因组DNA序列和转录因子蛋白。尽管现在有许多模式生物可获得大量的序列数据，但预测转录因子的结合位置，通常称为“基序检测”，仍然极具挑战性。在本研究中，我们开发了一种用于结合位点预测的新型生物信息学方法。我们通过以无偏的方式扩展现有的支持向量机（SVM）方法来实现这一点，使其包含所有可能的带间隙k-mer，这些k-mer代表结合位点内复杂核苷酸依赖性的不同组合。通过一系列严格的交叉验证实验，我们展示了这种新方法与现有SVM方法相比的优势。我们还通过报告其在一组已知可调控果蝇早期胚胎前后轴基因表达的127个基因组区域上的改进性能，证明了我们新方法的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

折叠k谱核：一种利用有间隙核苷酸依赖性检测转录因子结合位点的机器学习方法。

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

折叠k谱核：一种利用有间隙核苷酸依赖性检测转录因子结合位点的机器学习方法。

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献