• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

StackTADB:一种基于堆叠的集成学习模型,可准确预测果蝇中拓扑关联域(TAD)的边界。

StackTADB: a stacking-based ensemble learning model for predicting the boundaries of topologically associating domains (TADs) accurately in fruit flies.

机构信息

College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.

School of Software, Shandong University, Jinan, 250101, Shandong, China.

出版信息

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac023.

DOI:10.1093/bib/bbac023
PMID:35181793
Abstract

Chromosome is composed of many distinct chromatin domains, referred to variably as topological domains or topologically associating domains (TADs). The domains are stable across different cell types and highly conserved across species, thus these chromatin domains have been considered as the basic units of chromosome folding and regarded as an important secondary structure in chromosome organization. However, the identification of TAD boundaries is still a great challenge due to the high cost and low resolution of Hi-C data or experiments. In this study, we propose a novel ensemble learning framework, termed as StackTADB, for predicting the boundaries of TADs. StackTADB integrates four base classifiers including Random Forest, Logistic Regression, K-NearestNeighbor and Support Vector Machine. From the analysis of a series of examinations on the data set in the previous study, it is concluded that StackTADB has optimal performance in six metrics, AUC, Accuracy, MCC, Precision, Recall and F1 score, and it is superior to the existing methods. In addition, the comparison of the performance of multiple features shows that Kmers-based features play an essential role in predicting TADs boundaries of fruit flies, and we also apply the SHapley Additive exPlanations (SHAP) framework to interpret the predictions of StackTADB to identify the reason why Kmers-based features are vital. The experimental results show that the subsequences matching the BEAF-32 motif play a crucial role in predicting the boundaries of TADs. The source code is freely available at https://github.com/HaoWuLab-Bioinformatics/StackTADB and the webserver of StackTADB is freely available at http://hwtad.sdu.edu.cn:8002/StackTADB.

摘要

染色体由许多不同的染色质域组成,这些域被不同地称为拓扑结构域或拓扑关联域 (TAD)。这些结构域在不同的细胞类型中是稳定的,并且在物种间高度保守,因此这些染色质域被认为是染色体折叠的基本单位,并被视为染色体组织的重要二级结构。然而,由于 Hi-C 数据或实验的成本高和分辨率低,TAD 边界的识别仍然是一个巨大的挑战。在这项研究中,我们提出了一种新的集成学习框架,称为 StackTADB,用于预测 TAD 的边界。StackTADB 集成了包括随机森林、逻辑回归、K-最近邻和支持向量机在内的四个基础分类器。通过对先前研究中数据集的一系列检查的分析,得出 StackTADB 在六个指标(AUC、准确性、MCC、精度、召回率和 F1 得分)中具有最佳性能,并且优于现有方法。此外,对多种特征的性能比较表明,基于 Kmer 的特征在预测果蝇 TAD 边界方面起着重要作用,我们还应用 SHapley Additive exPlanations (SHAP) 框架来解释 StackTADB 的预测,以确定基于 Kmer 的特征至关重要的原因。实验结果表明,与 BEAF-32 基序匹配的子序列在预测 TAD 边界中起着关键作用。源代码可在 https://github.com/HaoWuLab-Bioinformatics/StackTADB 上免费获得,StackTADB 的网络服务器可在 http://hwtad.sdu.edu.cn:8002/StackTADB 上免费获得。

相似文献

1
StackTADB: a stacking-based ensemble learning model for predicting the boundaries of topologically associating domains (TADs) accurately in fruit flies.StackTADB:一种基于堆叠的集成学习模型,可准确预测果蝇中拓扑关联域(TAD)的边界。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac023.
2
High-resolution TADs reveal DNA sequences underlying genome organization in flies.高分辨率拓扑关联结构域揭示果蝇基因组组织背后的DNA序列。
Nat Commun. 2018 Jan 15;9(1):189. doi: 10.1038/s41467-017-02525-w.
3
Accurate prediction of boundaries of high resolution topologically associated domains (TADs) in fruit flies using deep learning.使用深度学习准确预测果蝇高分辨率拓扑关联域 (TAD) 的边界。
Nucleic Acids Res. 2019 Jul 26;47(13):e78. doi: 10.1093/nar/gkz315.
4
Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains.活跃染色质和转录在染色体划分为拓扑相关结构域的过程中起关键作用。
Genome Res. 2016 Jan;26(1):70-84. doi: 10.1101/gr.196006.115. Epub 2015 Oct 30.
5
A comparison of topologically associating domain callers over mammals at high resolution.在高分辨率下比较哺乳动物的拓扑关联结构域调用器。
BMC Bioinformatics. 2022 Apr 12;23(1):127. doi: 10.1186/s12859-022-04674-2.
6
ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.ClusterTAD:一种从Hi-C数据中检测染色体拓扑相关结构域的无监督机器学习方法。
BMC Bioinformatics. 2017 Nov 14;18(1):480. doi: 10.1186/s12859-017-1931-2.
7
The role of insulators and transcription in 3D chromatin organization of flies.绝缘子和转录在果蝇三维染色质组织中的作用。
Genome Res. 2022 Apr;32(4):682-698. doi: 10.1101/gr.275809.121. Epub 2022 Mar 30.
8
A machine learning framework for the prediction of chromatin folding in using epigenetic features.一种使用表观遗传特征预测染色质折叠的机器学习框架。
PeerJ Comput Sci. 2020 Nov 30;6:e307. doi: 10.7717/peerj-cs.307. eCollection 2020.
9
4C-seq characterization of Drosophila BEAF binding regions provides evidence for highly variable long-distance interactions between active chromatin.4C-seq 分析果蝇 BEAF 结合区域,为活性染色质之间高度可变的长距离相互作用提供证据。
PLoS One. 2018 Sep 24;13(9):e0203843. doi: 10.1371/journal.pone.0203843. eCollection 2018.
10
Sub-kb Hi-C in D. melanogaster reveals conserved characteristics of TADs between insect and mammalian cells.黑腹果蝇中的亚千碱基对Hi-C揭示了昆虫和哺乳动物细胞之间拓扑相关结构域(TADs)的保守特征。
Nat Commun. 2018 Jan 15;9(1):188. doi: 10.1038/s41467-017-02526-9.

引用本文的文献

1
GRANet: a graph residual attention network for gene regulatory network inference.GRANet:一种用于基因调控网络推断的图残差注意力网络。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf349.
2
A novel deep learning framework with dynamic tokenization for identifying chromatin interactions along with motif importance investigation.一种具有动态标记化功能的新型深度学习框架,用于识别染色质相互作用并进行基序重要性研究。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf289.
3
Unveiling Multi-Scale Architectural Features in Single-Cell Hi-C Data Using scCAFE.
使用scCAFE揭示单细胞Hi-C数据中的多尺度结构特征。
Adv Sci (Weinh). 2025 Jun;12(23):e2416432. doi: 10.1002/advs.202416432. Epub 2025 Apr 24.
4
deepTAD: an approach for identifying topologically associated domains based on convolutional neural network and transformer model.深度TAD:一种基于卷积神经网络和Transformer模型识别拓扑相关结构域的方法。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf127.
5
ACP-DPE: A Dual-Channel Deep Learning Model for Anticancer Peptide Prediction.ACP-DPE:一种用于抗癌肽预测的双通道深度学习模型。
IET Syst Biol. 2025 Jan-Dec;19(1):e70010. doi: 10.1049/syb2.70010.
6
6mA-StackingCV: an improved stacking ensemble model for predicting DNA N6-methyladenine site.6mA-StackingCV:一种用于预测DNA N6-甲基腺嘌呤位点的改进堆叠集成模型。
BioData Min. 2023 Nov 27;16(1):34. doi: 10.1186/s13040-023-00348-8.
7
iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species.iPro-WAEL:一种全面而强大的多物种启动子识别框架。
Nucleic Acids Res. 2022 Oct 14;50(18):10278-10289. doi: 10.1093/nar/gkac824.
8
IBPred: A sequence-based predictor for identifying ion binding protein in phage.IBPred:一种基于序列的噬菌体离子结合蛋白识别预测工具。
Comput Struct Biotechnol J. 2022 Aug 28;20:4942-4951. doi: 10.1016/j.csbj.2022.08.053. eCollection 2022.
9
A polygenic stacking classifier revealed the complicated platelet transcriptomic landscape of adult immune thrombocytopenia.一种多基因叠加分类器揭示了成人免疫性血小板减少症复杂的血小板转录组图谱。
Mol Ther Nucleic Acids. 2022 Apr 6;28:477-487. doi: 10.1016/j.omtn.2022.04.004. eCollection 2022 Jun 14.