支持向量机（SVM）和纠错输出编码（ECOC）相结合的方法用于转录因子的高效识别和分类。

The combination approach of SVM and ECOC for powerful identification and classification of transcription factor.

作者信息

Zheng Guangyong, Qian Ziliang, Yang Qing, Wei Chaochun, Xie Lu, Zhu Yangyong, Li Yixue

机构信息

Department of Computing and Information Technology, Fudan University, 220 Handan Road, Shanghai 200433, PR China.

出版信息

BMC Bioinformatics. 2008 Jun 16;9:282. doi: 10.1186/1471-2105-9-282.

DOI:10.1186/1471-2105-9-282

PMID:18554421

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2440765/

Abstract

BACKGROUND

Transcription factors (TFs) are core functional proteins which play important roles in gene expression control, and they are key factors for gene regulation network construction. Traditionally, they were identified and classified through experimental approaches. In order to save time and reduce costs, many computational methods have been developed to identify TFs from new proteins and to classify the resulted TFs. Though these methods have facilitated screening of TFs to some extent, low accuracy is still a common problem. With the fast growing number of new proteins, more precise algorithms for identifying TFs from new proteins and classifying the consequent TFs are in a high demand.

RESULTS

The support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Error-correcting output coding (ECOC) algorithm, which was originated from information and communication engineering fields, was introduced to combine with support vector machine (SVM) methodology for TF classification. The overall success rates of identification and classification achieved 88.22% and 97.83% respectively. Finally, a web site was constructed to let users access our tools (see Availability and requirements section for URL).

CONCLUSION

The SVM method was a valid and stable means for TFs identification with protein domains and functional sites as feature vectors. Error-correcting output coding (ECOC) algorithm is a powerful method for multi-class classification problem. When combined with SVM method, it can remarkably increase the accuracy of TF classification using protein domains and functional sites as feature vectors. In addition, our work implied that ECOC algorithm may succeed in a broad range of applications in biological data mining.

摘要

背景

转录因子（TFs）是核心功能蛋白，在基因表达调控中发挥重要作用，是构建基因调控网络的关键因素。传统上，它们是通过实验方法来鉴定和分类的。为了节省时间和降低成本，人们开发了许多计算方法来从新蛋白质中鉴定转录因子并对所得转录因子进行分类。尽管这些方法在一定程度上促进了转录因子的筛选，但低准确性仍是一个普遍问题。随着新蛋白质数量的快速增长，对从新蛋白质中鉴定转录因子并对后续转录因子进行分类的更精确算法的需求很高。

结果

利用支持向量机（SVM）算法构建了一个用于转录因子鉴定的自动检测器，其中蛋白质结构域和功能位点被用作特征向量。引入了源自信息与通信工程领域的纠错输出编码（ECOC）算法，并将其与支持向量机（SVM）方法相结合用于转录因子分类。鉴定和分类的总体成功率分别达到了88.22%和97.83%。最后，构建了一个网站，让用户可以访问我们的工具（有关网址，请参阅可用性和要求部分）。

结论

支持向量机方法是以蛋白质结构域和功能位点为特征向量进行转录因子鉴定的一种有效且稳定的手段。纠错输出编码（ECOC）算法是解决多类分类问题的一种强大方法。当与支持向量机方法结合时，它可以显著提高以蛋白质结构域和功能位点为特征向量进行转录因子分类的准确性。此外，我们的工作表明纠错输出编码算法可能在生物数据挖掘的广泛应用中取得成功。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77a7/2440765/01098a2e4401/1471-2105-9-282-1.jpg

相似文献

The combination approach of SVM and ECOC for powerful identification and classification of transcription factor.

BMC Bioinformatics. 2008 Jun 16;9:282. doi: 10.1186/1471-2105-9-282.

Combining SVM and ECOC for Identification of Protein Complexes from Protein Protein Interaction Networks by Integrating Amino Acids' Physical Properties and Complex Topology.

Interdiscip Sci. 2020 Sep;12(3):264-275. doi: 10.1007/s12539-020-00369-5. Epub 2020 May 21.

Sequence features of DNA binding sites reveal structural class of associated transcription factor.

Bioinformatics. 2006 Jan 15;22(2):157-63. doi: 10.1093/bioinformatics/bti731. Epub 2005 Nov 2.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

Identification of coding and non-coding sequences using local Holder exponent formalism.

Bioinformatics. 2005 Oct 15;21(20):3818-23. doi: 10.1093/bioinformatics/bti639. Epub 2005 Aug 23.

Binary tree of SVM: a new fast multiclass training and classification algorithm.

IEEE Trans Neural Netw. 2006 May;17(3):696-704. doi: 10.1109/TNN.2006.872343.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

Subclass problem-dependent design for error-correcting output codes.

IEEE Trans Pattern Anal Mach Intell. 2008 Jun;30(6):1041-54. doi: 10.1109/TPAMI.2008.38.

Identification of transcription factor contexts in literature using machine learning approaches.

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S11. doi: 10.1186/1471-2105-9-S3-S11.

Protein fold recognition based on error correcting output codes and SVM.

Protein Pept Lett. 2008;15(5):443-7. doi: 10.2174/092986608784567564.

引用本文的文献

Transcription factor prediction using protein 3D secondary structures.

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae762.

A hybrid approach for predicting transcription factors.

Front Bioinform. 2024 Jul 25;4:1425419. doi: 10.3389/fbinf.2024.1425419. eCollection 2024.

iMIGS: An innovative AI based prediction system for selecting the best patient-specific glaucoma treatment.

MethodsX. 2023 May 18;10:102209. doi: 10.1016/j.mex.2023.102209. eCollection 2023.

Optical Encoding Model Based on Orbital Angular Momentum Powered by Machine Learning.

Sensors (Basel). 2023 Mar 2;23(5):2755. doi: 10.3390/s23052755.

DeepTFactor: A deep learning-based tool for the prediction of transcription factors.

Proc Natl Acad Sci U S A. 2021 Jan 12;118(2). doi: 10.1073/pnas.2021171118.

Real-Time Classification of Diesel Marine Engine Loads Using Machine Learning.

Sensors (Basel). 2019 Jul 18;19(14):3172. doi: 10.3390/s19143172.

TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

PLoS One. 2013 Dec 12;8(12):e82238. doi: 10.1371/journal.pone.0082238. eCollection 2013.

A convolutional code-based sequence analysis model and its application.

Int J Mol Sci. 2013 Apr 16;14(4):8393-405. doi: 10.3390/ijms14048393.

An improved dimensionality reduction method for meta-transcriptome indexing based diseases classification.

BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S12. doi: 10.1186/1752-0509-6-S3-S12. Epub 2012 Dec 17.

Towards biological characters of interactions between transcription factors and their DNA targets in mammals.

BMC Genomics. 2012 Aug 13;13:388. doi: 10.1186/1471-2164-13-388.

本文引用的文献

Pillars Article: Control of Regulatory T Cell Development by the Transcription Factor Foxp3. Science 2003. 299: 1057-1061.

J Immunol. 2017 Feb 1;198(3):981-985.

ZIFIBI: Prediction of DNA binding sites for zinc finger proteins.

Biochem Biophys Res Commun. 2008 May 9;369(3):845-8. doi: 10.1016/j.bbrc.2008.02.106. Epub 2008 Mar 4.

Identification of DNA-binding proteins using support vector machines and evolutionary profiles.

BMC Bioinformatics. 2007 Nov 27;8:463. doi: 10.1186/1471-2105-8-463.

Prediction of subcellular protein localization based on functional domain composition.

Biochem Biophys Res Commun. 2007 Jun 1;357(2):366-70. doi: 10.1016/j.bbrc.2007.03.139. Epub 2007 Apr 2.

DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins.

Bioinformatics. 2007 Mar 1;23(5):634-6. doi: 10.1093/bioinformatics/btl672. Epub 2007 Jan 19.

New developments in the InterPro database.

Nucleic Acids Res. 2007 Jan;35(Database issue):D224-8. doi: 10.1093/nar/gkl841.

The Universal Protein Resource (UniProt).

Nucleic Acids Res. 2007 Jan;35(Database issue):D193-7. doi: 10.1093/nar/gkl929. Epub 2006 Nov 16.

Automatic transcription factor classifier based on functional domain composition.

Biochem Biophys Res Commun. 2006 Aug 18;347(1):141-4. doi: 10.1016/j.bbrc.2006.06.060. Epub 2006 Jun 21.

RARTF: database and tools for complete sets of Arabidopsis transcription factors.

DNA Res. 2005;12(4):247-56. doi: 10.1093/dnares/dsi011.

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

支持向量机（SVM）和纠错输出编码（ECOC）相结合的方法用于转录因子的高效识别和分类。

The combination approach of SVM and ECOC for powerful identification and classification of transcription factor.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献