• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

regCNN:通过整合表观遗传标记中的局部模式和转录因子结合基序来识别全基因组调控模块。

regCNN: identifying genome-wide -regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs.

作者信息

Yang Tzu-Hsien, Yang Ya-Chiao, Tu Kai-Chi

机构信息

Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.

出版信息

Comput Struct Biotechnol J. 2021 Dec 18;20:296-308. doi: 10.1016/j.csbj.2021.12.015. eCollection 2022.

DOI:10.1016/j.csbj.2021.12.015
PMID:35035784
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8724954/
Abstract

Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called -regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%-87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.

摘要

后生动物中的转录调控由转录因子(TFs)或调节蛋白与特定的模块化DNA调控序列(称为顺式调控模块,CRMs)的结合事件所控制。了解CRMs在基因组规模上的分布对于构建有助于诊断遗传疾病的后生动物转录调控网络至关重要。虽然传统的报告基因检测CRM识别方法可以深入了解某些CRM的功能,但这些方法通常成本高昂且通量较低。人们普遍认为,通过整合各种基因组数据,可以做出可靠的CRM预测。因此,研究人员通常在进行特定实验之前首先求助于计算算法进行全基因组CRM筛选。然而,目前现有的搜索潜在CRM的方法受到低灵敏度、预测准确性差或TFBS组成组合复杂性导致的计算时间长的限制。为了克服这些障碍,我们通过考虑TF结合基序和表观遗传谱中的逐碱基局部模式,设计了一种名为regCNN的新型CRM识别管道。在测试集上,regCNN在CRM识别中的准确率/auROC为84.5%/92.5%。通过进一步考虑表观遗传谱和TF结合基序中的局部模式,它在auROC值上比基于平均值的纯多层感知器模型的平均值提高了4.7%(92.5%-87.8%)。我们还证明,regCNN在auROC值上比所有当前可用工具至少高出11.3%。最后,验证了regCNN在处理CRM可变长度时对其调整窗口超参数具有鲁棒性。regCNN模型可从http://cobisHSS0.im.nuk.edu.tw/regCNN/下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/4e72027b1787/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/ddb5b98c06f8/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/670ff1f354b2/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/5f508d5297e2/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/b2419624251a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/881e24103727/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/2637ccff65dd/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/6960b02e5ee9/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/6a864c2892fa/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/4e72027b1787/gr9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/ddb5b98c06f8/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/670ff1f354b2/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/5f508d5297e2/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/b2419624251a/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/881e24103727/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/2637ccff65dd/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/6960b02e5ee9/gr7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/6a864c2892fa/gr8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd90/8724954/4e72027b1787/gr9.jpg

相似文献

1
regCNN: identifying genome-wide -regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs.regCNN:通过整合表观遗传标记中的局部模式和转录因子结合基序来识别全基因组调控模块。
Comput Struct Biotechnol J. 2021 Dec 18;20:296-308. doi: 10.1016/j.csbj.2021.12.015. eCollection 2022.
2
cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila.顺式MEP:果蝇基因组表观遗传图谱和顺式调控模块的综合数据库。
BMC Syst Biol. 2014;8 Suppl 4(Suppl 4):S8. doi: 10.1186/1752-0509-8-S4-S8. Epub 2014 Dec 8.
3
De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.通过对大量染色质免疫沉淀数据集进行综合分析,从头预测顺式调控元件和模块。
BMC Genomics. 2014 Dec 2;15:1047. doi: 10.1186/1471-2164-15-1047.
4
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison.利用插值马尔可夫模型和跨物种比较提高监督式 CRM 发现的准确性。
Nucleic Acids Res. 2011 Dec;39(22):9463-72. doi: 10.1093/nar/gkr621. Epub 2011 Aug 5.
5
CFA: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes.CFA:一种基于表观遗传密码注释顺式调控模块转录作用的可解释深度学习模型。
Comput Biol Med. 2023 Jan;152:106375. doi: 10.1016/j.compbiomed.2022.106375. Epub 2022 Nov 29.
6
Identification of cis-regulatory modules encoding temporal dynamics during development.鉴定在发育过程中编码时间动态的顺式调控模块。
BMC Genomics. 2014 Jun 27;15(1):534. doi: 10.1186/1471-2164-15-534.
7
DMLS: an automated pipeline to extract the Drosophila modular transcription regulators and targets from massive literature articles.DMLS:从大量文献文章中提取果蝇模块化转录调控因子及其靶标的自动化流水线。
Database (Oxford). 2024 Jun 20;2024:0. doi: 10.1093/database/baae049.
8
Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs.使用共现基序对预测人类基因组中的组织特异性顺式调控模块。
BMC Bioinformatics. 2012 Feb 7;13:25. doi: 10.1186/1471-2105-13-25.
9
Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA.通过结合DNA的比较分析和组成分析来识别顺式调控模块。
Bioinformatics. 2006 Dec 1;22(23):2858-64. doi: 10.1093/bioinformatics/btl499. Epub 2006 Oct 10.
10
Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression.转录调控模块的全基因组计算预测为人类基因表达带来了新见解。
Genome Res. 2006 May;16(5):656-68. doi: 10.1101/gr.4866006. Epub 2006 Apr 10.

引用本文的文献

1
Identifying the risk of Kawasaki disease based solely on routine blood test features through novel construction of machine learning models.通过构建新型机器学习模型,仅基于常规血液检测特征来识别川崎病的风险。
Comput Struct Biotechnol J. 2025 Jun 25;27:2832-2842. doi: 10.1016/j.csbj.2025.06.037. eCollection 2025.
2
A multi-omic meta-analysis reveals novel mechanisms of insecticide resistance in malaria vectors.一项多组学荟萃分析揭示了疟蚊抗杀虫剂的新机制。
Commun Biol. 2025 May 23;8(1):790. doi: 10.1038/s42003-025-08221-6.
3
DMLS: an automated pipeline to extract the Drosophila modular transcription regulators and targets from massive literature articles.

本文引用的文献

1
An Aggregation Method to Identify the RNA Meta-Stable Secondary Structure and its Functionally Interpretable Structure Ensemble.一种鉴定 RNA 亚稳态二级结构及其具有功能可解释性结构组合的聚集方法。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):75-86. doi: 10.1109/TCBB.2021.3082396. Epub 2022 Feb 3.
2
Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans.人类 IRES 图谱:一个综合平台,用于研究人类 IRES 驱动的翻译调控。
Database (Oxford). 2021 May 3;2021. doi: 10.1093/database/baab025.
3
The UCSC Genome Browser database: 2021 update.
DMLS:从大量文献文章中提取果蝇模块化转录调控因子及其靶标的自动化流水线。
Database (Oxford). 2024 Jun 20;2024:0. doi: 10.1093/database/baae049.
4
DEBFold: Computational Identification of RNA Secondary Structures for Sequences across Structural Families Using Deep Learning.DEBFold:使用深度学习对跨结构家族的序列进行 RNA 二级结构的计算识别。
J Chem Inf Model. 2024 May 13;64(9):3756-3766. doi: 10.1021/acs.jcim.4c00458. Epub 2024 Apr 22.
5
Databases and prospects of dynamic gene regulation in eukaryotes: A mini review.真核生物中动态基因调控的数据库与前景:一篇综述短文
Comput Struct Biotechnol J. 2023 Mar 22;21:2147-2159. doi: 10.1016/j.csbj.2023.03.032. eCollection 2023.
6
YTLR: Extracting yeast transcription factor-gene associations from the literature using automated literature readers.YTLR:使用自动文献阅读器从文献中提取酵母转录因子与基因的关联
Comput Struct Biotechnol J. 2022 Aug 24;20:4636-4644. doi: 10.1016/j.csbj.2022.08.041. eCollection 2022.
7
SSRTool: A web tool for evaluating RNA secondary structure predictions based on species-specific functional interpretability.SSRTool:一种基于物种特异性功能可解释性评估RNA二级结构预测的网络工具。
Comput Struct Biotechnol J. 2022 May 18;20:2473-2483. doi: 10.1016/j.csbj.2022.05.028. eCollection 2022.
UCSC 基因组浏览器数据库:2021 年更新。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1046-D1057. doi: 10.1093/nar/gkaa1070.
4
Opening the Black Box: Interpretable Machine Learning for Geneticists.打开黑箱:遗传学家的可解释机器学习。
Trends Genet. 2020 Jun;36(6):442-455. doi: 10.1016/j.tig.2020.03.005. Epub 2020 Apr 17.
5
Transcription factor regulatory modules provide the molecular mechanisms for functional redundancy observed among transcription factors in yeast.转录因子调控模块为酵母中观察到的转录因子之间功能冗余提供了分子机制。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):630. doi: 10.1186/s12859-019-3212-8.
6
JASPAR 2020: update of the open-access database of transcription factor binding profiles.JASPAR 2020:转录因子结合谱开放获取数据库的更新。
Nucleic Acids Res. 2020 Jan 8;48(D1):D87-D92. doi: 10.1093/nar/gkz1001.
7
Inflated performance measures in enhancer-promoter interaction-prediction methods.增强子-启动子相互作用预测方法中夸大的性能指标。
Nat Genet. 2019 Aug;51(8):1196-1198. doi: 10.1038/s41588-019-0434-7.
8
Local epigenomic state cannot discriminate interacting and non-interacting enhancer-promoter pairs with high accuracy.局部表观基因组状态不能高精度地区分相互作用和非相互作用的增强子-启动子对。
PLoS Comput Biol. 2018 Dec 18;14(12):e1006625. doi: 10.1371/journal.pcbi.1006625. eCollection 2018 Dec.
9
FlyBase 2.0: the next generation.FlyBase 2.0:下一代。
Nucleic Acids Res. 2019 Jan 8;47(D1):D759-D765. doi: 10.1093/nar/gky1003.
10
REDfly: the transcriptional regulatory element database for Drosophila.REDfly:果蝇转录调控元件数据库。
Nucleic Acids Res. 2019 Jan 8;47(D1):D828-D834. doi: 10.1093/nar/gky957.