一种预测转录因子的混合方法。

A hybrid approach for predicting transcription factors.

作者信息

Patiyal Sumeet, Tiwari Palak, Ghai Mohit, Dhapola Aman, Dhall Anjali, Raghava Gajendra P S

机构信息

Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.

出版信息

Front Bioinform. 2024 Jul 25;4:1425419. doi: 10.3389/fbinf.2024.1425419. eCollection 2024.

DOI:10.3389/fbinf.2024.1425419

PMID:39119181

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11306938/

Abstract

Transcription factors are essential DNA-binding proteins that regulate the transcription rate of several genes and control the expression of genes inside a cell. The prediction of transcription factors with high precision is important for understanding biological processes such as cell differentiation, intracellular signaling, and cell-cycle control. In this study, we developed a hybrid method that combines alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested, and evaluated on a large dataset that contains 19,406 transcription factors and 523,560 non-transcription factor protein sequences. To avoid biases in evaluation, the datasets were divided into training and validation/independent datasets, where 80% of the data was used for training, and the remaining 20% was used for external validation. In the case of alignment-free methods, models were developed using machine learning techniques and the composition-based features of a protein. Our best alignment-free model obtained an AUC of 0.97 on an independent dataset. In the case of the alignment-based method, we used BLAST at different cut-offs to predict the transcription factors. Although the alignment-based method demonstrated excellent performance, it was unable to cover all transcription factors due to instances of no hits. To combine the strengths of both methods, we developed a hybrid method that combines alignment-free and alignment-based methods. In the hybrid method, we added the scores of the alignment-free and alignment-based methods and achieved a maximum AUC of 0.99 on the independent dataset. The method proposed in this study performs better than existing methods. We incorporated the best models in the webserver/Python Package Index/standalone package of "TransFacPred" (https://webs.iiitd.edu.in/raghava/transfacpred).

摘要

转录因子是一类重要的DNA结合蛋白，可调节多个基因的转录速率并控制细胞内基因的表达。高精度预测转录因子对于理解细胞分化、细胞内信号传导和细胞周期调控等生物学过程至关重要。在本研究中，我们开发了一种混合方法，该方法结合了基于比对和不基于比对的方法，以更高的准确性预测转录因子。所有模型均在一个包含19,406个转录因子和523,560个非转录因子蛋白质序列的大型数据集上进行了训练、测试和评估。为避免评估偏差，数据集被分为训练集和验证/独立数据集，其中80%的数据用于训练，其余20%用于外部验证。对于不基于比对的方法，使用机器学习技术和蛋白质的基于组成的特征开发模型。我们最佳的不基于比对的模型在独立数据集上的AUC为0.97。对于基于比对的方法，我们使用不同截止值的BLAST来预测转录因子。尽管基于比对的方法表现出色，但由于无命中实例，它无法涵盖所有转录因子。为了结合两种方法的优势，我们开发了一种结合不基于比对和基于比对方法的混合方法。在混合方法中，我们将不基于比对和基于比对方法的分数相加，在独立数据集上实现了最高0.99的AUC。本研究提出的方法比现有方法表现更好。我们将最佳模型整合到了“TransFacPred”的网络服务器/ Python包索引/独立包中（https://webs.iiitd.edu.in/raghava/transfacpred）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84f7/11306938/9a15bb8a6f49/fbinf-04-1425419-g001.jpg

相似文献

A hybrid approach for predicting transcription factors.一种预测转录因子的混合方法。

Front Bioinform. 2024 Jul 25;4:1425419. doi: 10.3389/fbinf.2024.1425419. eCollection 2024.

A method for predicting linear and conformational B-cell epitopes in an antigen from its primary sequence.一种从抗原的一级序列预测线性和构象 B 细胞表位的方法。

Comput Biol Med. 2024 Mar;170:108083. doi: 10.1016/j.compbiomed.2024.108083. Epub 2024 Jan 28.

ToxinPred 3.0: An improved method for predicting the toxicity of peptides.ToxinPred 3.0：一种改进的多肽毒性预测方法。

Comput Biol Med. 2024 Sep;179:108926. doi: 10.1016/j.compbiomed.2024.108926. Epub 2024 Jul 21.

A web server for predicting and scanning of IL-5 inducing peptides using alignment-free and alignment-based method.一个使用无比对和基于比对的方法预测和扫描白细胞介素-5诱导肽的网络服务器。

Comput Biol Med. 2023 May;158:106864. doi: 10.1016/j.compbiomed.2023.106864. Epub 2023 Apr 4.

TNFepitope: A webserver for the prediction of TNF-α inducing epitopes.TNF 表位：用于预测 TNF-α 诱导表位的网络服务器。

Comput Biol Med. 2023 Jun;160:106929. doi: 10.1016/j.compbiomed.2023.106929. Epub 2023 Apr 20.

An ensemble method for prediction of phage-based therapy against bacterial infections.一种用于预测基于噬菌体的细菌感染治疗方法的集成方法。

Front Microbiol. 2023 Mar 23;14:1148579. doi: 10.3389/fmicb.2023.1148579. eCollection 2023.

Prediction of Anti-Freezing Proteins From Their Evolutionary Profile.基于进化特征预测抗冻蛋白

Proteomics. 2025 Feb;25(3):e202400157. doi: 10.1002/pmic.202400157. Epub 2024 Sep 20.

ToxinPred2: an improved method for predicting toxicity of proteins.ToxinPred2：一种改进的蛋白质毒性预测方法。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac174.

MRSLpred-a hybrid approach for predicting multi-label subcellular localization of mRNA at the genome scale.MRSLpred——一种在基因组规模上预测mRNA多标签亚细胞定位的混合方法。

Front Bioinform. 2024 Feb 6;4:1341479. doi: 10.3389/fbinf.2024.1341479. eCollection 2024.

Sigma70Pred: A highly accurate method for predicting sigma70 promoter in K-12 strains.Sigma70Pred：一种预测K-12菌株中sigma70启动子的高精度方法。

Front Microbiol. 2022 Nov 14;13:1042127. doi: 10.3389/fmicb.2022.1042127. eCollection 2022.

引用本文的文献

Ovarian transcriptome analyses indicate that weak juvenile hormone signaling underlies the molecular basis of oogenesis deficiencies in mosquitoes.卵巢转录组分析表明，微弱的保幼激素信号是蚊子卵子发生缺陷的分子基础。

BMC Biol. 2025 Jun 9;23(1):160. doi: 10.1186/s12915-025-02266-z.

Chromosome-scale assemblies of three Ormosia species: repetitive sequences distribution and structural rearrangement.三种红豆属植物的染色体水平组装：重复序列分布与结构重排

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf047.

本文引用的文献

Pfeature: A Tool for Computing Wide Range of Protein Features and Building Prediction Models.Pfeature：一种用于计算广泛的蛋白质特征并构建预测模型的工具。

J Comput Biol. 2023 Feb;30(2):204-222. doi: 10.1089/cmb.2022.0241. Epub 2022 Oct 13.

Subcellular Localization Prediction of Human Proteins Using Multifeature Selection Methods.基于多特征选择方法的人类蛋白质亚细胞定位预测。

Biomed Res Int. 2022 Sep 12;2022:3288527. doi: 10.1155/2022/3288527. eCollection 2022.

A deep learning-based method for the prediction of DNA interacting residues in a protein.基于深度学习的蛋白质 DNA 相互作用残基预测方法。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac322.

HLAncPred: a method for predicting promiscuous non-classical HLA binding sites.HLAncPred：一种预测非经典 HLA 结合基序的方法。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac192.

Computer-aided prediction of inhibitors against STAT3 for managing COVID-19 associated cytokine storm.基于计算机的 STAT3 抑制剂预测，用于管理 COVID-19 相关细胞因子风暴。

Comput Biol Med. 2021 Oct;137:104780. doi: 10.1016/j.compbiomed.2021.104780. Epub 2021 Aug 21.

Transcription Factors: The Fulcrum Between Cell Development and Carcinogenesis.转录因子：细胞发育与癌变之间的支点

Front Oncol. 2021 Jun 14;11:681377. doi: 10.3389/fonc.2021.681377. eCollection 2021.

The Interplay Between Chromatin Architecture and Lineage-Specific Transcription Factors and the Regulation of Gene Expression.染色质结构与谱系特异性转录因子的相互作用及其对基因表达的调控。

Front Immunol. 2021 Mar 16;12:659761. doi: 10.3389/fimmu.2021.659761. eCollection 2021.

DeepTFactor: A deep learning-based tool for the prediction of transcription factors.DeepTFactor：一种基于深度学习的转录因子预测工具。

Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). doi: 10.1073/pnas.2021171118.

FOXO transcription factor family in cancer and metastasis.叉头框转录因子家族与癌症和转移。

Cancer Metastasis Rev. 2020 Sep;39(3):681-709. doi: 10.1007/s10555-020-09883-w.

Applications of genome editing technology in the targeted therapy of human diseases: mechanisms, advances and prospects.基因组编辑技术在人类疾病靶向治疗中的应用：机制、进展与展望。

Signal Transduct Target Ther. 2020 Jan 3;5(1):1. doi: 10.1038/s41392-019-0089-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种预测转录因子的混合方法。

A hybrid approach for predicting transcription factors.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献