• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

折叠k谱核:一种利用有间隙核苷酸依赖性检测转录因子结合位点的机器学习方法。

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies.

作者信息

Elmas Abdulkadir, Wang Xiaodong, Dresch Jacqueline M

机构信息

Department of Electrical Engineering, Columbia University, New York, NY, United States of America.

Department of Mathematics and Computer Science, Clark University, Worcester, MA, United States of America.

出版信息

PLoS One. 2017 Oct 5;12(10):e0185570. doi: 10.1371/journal.pone.0185570. eCollection 2017.

DOI:10.1371/journal.pone.0185570
PMID:28982128
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5628859/
Abstract

Understanding the molecular machinery involved in transcriptional regulation is central to improving our knowledge of an organism's development, disease, and evolution. The building blocks of this complex molecular machinery are an organism's genomic DNA sequence and transcription factor proteins. Despite the vast amount of sequence data now available for many model organisms, predicting where transcription factors bind, often referred to as 'motif detection' is still incredibly challenging. In this study, we develop a novel bioinformatic approach to binding site prediction. We do this by extending pre-existing SVM approaches in an unbiased way to include all possible gapped k-mers, representing different combinations of complex nucleotide dependencies within binding sites. We show the advantages of this new approach when compared to existing SVM approaches, through a rigorous set of cross-validation experiments. We also demonstrate the effectiveness of our new approach by reporting on its improved performance on a set of 127 genomic regions known to regulate gene expression along the anterio-posterior axis in early Drosophila embryos.

摘要

了解转录调控所涉及的分子机制对于增进我们对生物体发育、疾病和进化的认识至关重要。这个复杂分子机制的组成部分是生物体的基因组DNA序列和转录因子蛋白。尽管现在有许多模式生物可获得大量的序列数据,但预测转录因子的结合位置,通常称为“基序检测”,仍然极具挑战性。在本研究中,我们开发了一种用于结合位点预测的新型生物信息学方法。我们通过以无偏的方式扩展现有的支持向量机(SVM)方法来实现这一点,使其包含所有可能的带间隙k-mer,这些k-mer代表结合位点内复杂核苷酸依赖性的不同组合。通过一系列严格的交叉验证实验,我们展示了这种新方法与现有SVM方法相比的优势。我们还通过报告其在一组已知可调控果蝇早期胚胎前后轴基因表达的127个基因组区域上的改进性能,证明了我们新方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/1f8d8a718993/pone.0185570.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/49dcf83cf043/pone.0185570.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/4c88008a9f4b/pone.0185570.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/ae056304ce17/pone.0185570.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/02307d29757b/pone.0185570.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/d73b0f72c3d9/pone.0185570.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/1f8d8a718993/pone.0185570.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/49dcf83cf043/pone.0185570.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/4c88008a9f4b/pone.0185570.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/ae056304ce17/pone.0185570.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/02307d29757b/pone.0185570.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/d73b0f72c3d9/pone.0185570.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd3a/5628859/1f8d8a718993/pone.0185570.g006.jpg

相似文献

1
The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies.折叠k谱核:一种利用有间隙核苷酸依赖性检测转录因子结合位点的机器学习方法。
PLoS One. 2017 Oct 5;12(10):e0185570. doi: 10.1371/journal.pone.0185570. eCollection 2017.
2
A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites.一种基于图的基序检测算法对转录因子结合位点中的复杂核苷酸依赖性进行建模。
Nucleic Acids Res. 2006;34(20):5730-9. doi: 10.1093/nar/gkl585. Epub 2006 Oct 13.
3
MD-SVM: a novel SVM-based algorithm for the motif discovery of transcription factor binding sites.MD-SVM:一种基于 SVM 的新型算法,用于转录因子结合位点的基序发现。
BMC Bioinformatics. 2019 May 1;20(Suppl 7):200. doi: 10.1186/s12859-019-2735-3.
4
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.利用核苷酸谱和组成预测RNA中的蛋白质结合区域。
BMC Syst Biol. 2017 Mar 14;11(Suppl 2):16. doi: 10.1186/s12918-017-0386-4.
5
DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.DNA 序列+形状核函数实现了无比对的转录因子结合建模。
Bioinformatics. 2017 Oct 1;33(19):3003-3010. doi: 10.1093/bioinformatics/btx336.
6
Integrating genomic data to predict transcription factor binding.整合基因组数据以预测转录因子结合
Genome Inform. 2005;16(1):83-94.
7
Context specific transcription factor prediction.上下文特异性转录因子预测
Ann Biomed Eng. 2007 Jun;35(6):1053-67. doi: 10.1007/s10439-007-9268-z. Epub 2007 Mar 22.
8
Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements.机器学习预测人类视网膜顺式调控元件中非编码变异的影响。
Transl Vis Sci Technol. 2022 Apr 1;11(4):16. doi: 10.1167/tvst.11.4.16.
9
Systematic analysis of binding of transcription factors to noncoding variants.转录因子与非编码变异结合的系统分析。
Nature. 2021 Mar;591(7848):147-151. doi: 10.1038/s41586-021-03211-0. Epub 2021 Jan 27.
10
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.kmer-SVM:一个用于在基因组数据集识别预测性调控序列特征的网络服务器。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W544-56. doi: 10.1093/nar/gkt519. Epub 2013 Jun 14.

引用本文的文献

1
Experimental approaches to investigate biophysical interactions between homeodomain transcription factors and DNA.研究同源结构域转录因子与DNA之间生物物理相互作用的实验方法。
Biochim Biophys Acta Gene Regul Mech. 2025 Mar;1868(1):195074. doi: 10.1016/j.bbagrm.2024.195074. Epub 2024 Dec 5.
2
Of numbers and movement - understanding transcription factor pathogenesis by advanced microscopy.从数字和运动角度理解先进显微镜下转录因子发病机制
Dis Model Mech. 2020 Dec 29;13(12):dmm046516. doi: 10.1242/dmm.046516.
3
gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning.

本文引用的文献

1
WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.WSMD:在转录因子 ChIP-seq 数据中进行弱监督基序发现。
Sci Rep. 2017 Jun 12;7(1):3217. doi: 10.1038/s41598-017-03554-7.
2
LMMO: A Large Margin Approach for Refining Regulatory Motifs.LMMO:一种用于优化调控基序的大间隔方法。
IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):913-925. doi: 10.1109/TCBB.2017.2691325. Epub 2017 Apr 5.
3
Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome.果蝇基因组中转录因子结合位点的核苷酸相互依赖性
基于模体的机器学习在γ变形菌复制原点鉴定与分类学研究中的应用
Sci Rep. 2020 Apr 21;10(1):6727. doi: 10.1038/s41598-020-63424-7.
4
REDfly: the transcriptional regulatory element database for Drosophila.REDfly:果蝇转录调控元件数据库。
Nucleic Acids Res. 2019 Jan 8;47(D1):D828-D834. doi: 10.1093/nar/gky957.
Gene Regul Syst Bio. 2016 Jun 12;10:21-33. doi: 10.4137/GRSB.S38462. eCollection 2016.
4
Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy.利用简并K元组或Kmer策略鉴定微小RNA前体。
J Theor Biol. 2015 Nov 21;385:153-9. doi: 10.1016/j.jtbi.2015.08.025. Epub 2015 Sep 9.
5
Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One:一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.
6
MARZ: an algorithm to combinatorially analyze gapped n-mer models of transcription factor binding.MARZ:一种用于组合分析转录因子结合的带间隙n元模型的算法。
BMC Bioinformatics. 2015 Jan 31;16:30. doi: 10.1186/s12859-014-0446-3.
7
repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.repDNA:一个 Python 包,通过结合用户定义的物理化学性质和序列顺序效应,为 DNA 序列生成各种模式的特征向量。
Bioinformatics. 2015 Apr 15;31(8):1307-9. doi: 10.1093/bioinformatics/btu820. Epub 2014 Dec 10.
8
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。
PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.
9
Enhanced regulatory sequence prediction using gapped k-mer features.使用带缺口的 k-mer 特征增强调控序列预测。
PLoS Comput Biol. 2014 Jul 17;10(7):e1003711. doi: 10.1371/journal.pcbi.1003711. eCollection 2014 Jul.
10
Integrating diverse datasets improves developmental enhancer prediction.整合多种数据集可提高发育增强子预测的准确性。
PLoS Comput Biol. 2014 Jun 26;10(6):e1003677. doi: 10.1371/journal.pcbi.1003677. eCollection 2014 Jun.