iRSpot-GAEnsC：通过集成分类器识别重组位点并扩展周氏伪氨基酸组成概念以构建DNA样本

iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

作者信息

Kabir Muhammad, Hayat Maqsood

机构信息

Department of Computer Science, Abdul Wali Khan University, Mardan, KP, Pakistan.

出版信息

Mol Genet Genomics. 2016 Feb;291(1):285-96. doi: 10.1007/s00438-015-1108-5. Epub 2015 Aug 30.

DOI:10.1007/s00438-015-1108-5

PMID:26319782

Abstract

Meiotic recombination is vital for maintaining the sequence diversity in human genome. Meiosis and recombination are considered the essential phases of cell division. In meiosis, the genome is divided into equal parts for sexual reproduction whereas in recombination, the diverse genomes are combined to form new combination of genetic variations. Recombination process does not occur randomly across the genomes, it targets specific areas called recombination "hotspots" and "coldspots". Owing to huge exploration of polygenetic sequences in data banks, it is impossible to recognize the sequences through conventional methods. Looking at the significance of recombination spots, it is indispensable to develop an accurate, fast, robust, and high-throughput automated computational model. In this model, the numerical descriptors are extracted using two sequence representation schemes namely: dinucleotide composition and trinucleotide composition. The performances of seven classification algorithms were investigated. Finally, the predicted outcomes of individual classifiers are fused to form ensemble classification, which is formed through majority voting and genetic algorithm (GA). The performance of GA-based ensemble model is quite promising compared to individual classifiers and majority voting-based ensemble model. iRSpot-GAEnsC has achieved 84.46 % accuracy. The empirical results revealed that the performance of iRSpot-GAEnsC is not only higher than the examined algorithms but also better than existing methods in the literature developed so far. It is anticipated that the proposed model might be helpful for research community, academia and for drug discovery.

摘要

减数分裂重组对于维持人类基因组中的序列多样性至关重要。减数分裂和重组被认为是细胞分裂的重要阶段。在减数分裂中，基因组被等分为用于有性生殖的部分，而在重组中，不同的基因组被组合形成新的遗传变异组合。重组过程并非在基因组中随机发生，它针对特定区域，即所谓的重组“热点”和“冷点”。由于数据库中多基因序列的大量探索，通过传统方法识别这些序列是不可能的。鉴于重组位点的重要性，开发一种准确、快速、稳健且高通量的自动化计算模型是必不可少的。在该模型中，使用两种序列表示方案提取数值描述符，即：二核苷酸组成和三核苷酸组成。研究了七种分类算法的性能。最后，将各个分类器的预测结果融合形成集成分类，这是通过多数投票和遗传算法（GA）形成的。与单个分类器和基于多数投票的集成模型相比，基于GA的集成模型的性能非常有前景。iRSpot - GAEnsC的准确率达到了84.46%。实证结果表明，iRSpot - GAEnsC的性能不仅高于所研究的算法，而且优于迄今为止文献中已有的方法。预计所提出的模型可能对研究团体、学术界以及药物发现有所帮助。

相似文献

iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.iRSpot-GAEnsC：通过集成分类器识别重组位点并扩展周氏伪氨基酸组成概念以构建DNA样本

Mol Genet Genomics. 2016 Feb;291(1):285-96. doi: 10.1007/s00438-015-1108-5. Epub 2015 Aug 30.

iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components.iRSpot-TNCPseAAC：利用三核苷酸组成和伪氨基酸成分识别重组位点。

Int J Mol Sci. 2014 Jan 24;15(2):1746-66. doi: 10.3390/ijms15021746.

iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space.iACP - GAEnsC：基于进化遗传算法的利用混合特征空间对抗癌肽进行集成分类

Artif Intell Med. 2017 Jun;79:62-70. doi: 10.1016/j.artmed.2017.06.008. Epub 2017 Jun 17.

iRSpot-DTS: Predict recombination spots by incorporating the dinucleotide-based spare-cross covariance information into Chou's pseudo components.iRSpot-DTS：通过将基于二核苷酸的空位交叉协方差信息纳入到周的伪分量中，来预测重组热点。

Genomics. 2019 Dec;111(6):1760-1770. doi: 10.1016/j.ygeno.2018.11.031. Epub 2018 Dec 6.

RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features.RF-DYMHC：利用含间隙二核苷酸组成特征的随机森林模型检测酵母减数分裂重组热点和冷点

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W47-51. doi: 10.1093/nar/gkm217. Epub 2007 May 3.

iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou's pseudo components.iRSpot-ADPM：通过将相关二核苷酸产物模型纳入周氏伪组分来识别重组位点。

J Theor Biol. 2018 Mar 14;441:1-8. doi: 10.1016/j.jtbi.2017.12.025. Epub 2018 Jan 2.

iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition.iRSpot-PseDNC：基于伪二核苷酸组成识别重组热点。

Nucleic Acids Res. 2013 Apr 1;41(6):e68. doi: 10.1093/nar/gks1450. Epub 2013 Jan 8.

iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC.iNuc-STNC：一种基于序列的预测器，通过扩展SAAC和周式伪氨基酸组成的概念来识别基因组中的核小体定位。

Mol Biosyst. 2016 Jul 19;12(8):2587-93. doi: 10.1039/c6mb00221h.

iRSpot-Pse6NC: Identifying recombination spots in by incorporating hexamer composition into general PseKNC.iRSpot-Pse6NC：通过将六聚体组成纳入通用 PseKNC 来识别中的重组热点。

Int J Biol Sci. 2018 May 22;14(8):883-891. doi: 10.7150/ijbs.24616. eCollection 2018.

iRSpot-EL: identify recombination spots with an ensemble learning approach.iRSpot-EL：基于集成学习方法识别重组热点。

Bioinformatics. 2017 Jan 1;33(1):35-41. doi: 10.1093/bioinformatics/btw539. Epub 2016 Aug 16.

引用本文的文献

Identification of intelligence-related proteins through a robust two-layer predictor.通过强大的双层预测器鉴定与智力相关的蛋白质。

Commun Integr Biol. 2022 Nov 15;15(1):253-264. doi: 10.1080/19420889.2022.2143101. eCollection 2022.

iAcety-SmRF: Identification of Acetylation Protein by Using Statistical Moments and Random Forest.iAcety-SmRF：利用统计矩和随机森林鉴定乙酰化蛋白

Membranes (Basel). 2022 Feb 25;12(3):265. doi: 10.3390/membranes12030265.

Prediction of Recombination Spots Using Novel Hybrid Feature Extraction Method via Deep Learning Approach.通过深度学习方法使用新型混合特征提取方法预测重组位点

Front Genet. 2020 Sep 17;11:539227. doi: 10.3389/fgene.2020.539227. eCollection 2020.

Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction the Chou's 5-steps Rule and General Pseudo Components.使用双轮廓贝叶斯特征提取、周氏五步法则和广义伪组分鉴定赖氨酸N-乙酰化位点。

Curr Genomics. 2019 Dec;20(8):592-601. doi: 10.2174/1389202921666191223154629.

iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments Chou's 5-steps Rule and Pseudo Components.iSulfoTyr-PseAAC：通过结合统计矩、周氏五步法则和伪组分来识别酪氨酸硫酸化位点

Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609.

Some illuminating remarks on molecular genetics and genomics as well as drug development.关于分子遗传学和基因组学以及药物开发的一些有启发性的观点。

Mol Genet Genomics. 2020 Mar;295(2):261-274. doi: 10.1007/s00438-019-01634-z. Epub 2020 Jan 1.

RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule.RAACBook：一个基于简化氨基酸字母表的网络服务器，用于通过使用周保罗的五步法则进行序列相关推断。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz131.

iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou's 5-step rule.iCrotoK-PseAAC：根据周的五步规则，通过混合位置相对统计特征来识别赖氨酸巴豆酰化位点。

PLoS One. 2019 Nov 21;14(11):e0223993. doi: 10.1371/journal.pone.0223993. eCollection 2019.

iPseU-CNN: Identifying RNA Pseudouridine Sites Using Convolutional Neural Networks.iPseU-CNN：使用卷积神经网络识别RNA假尿苷位点。

Mol Ther Nucleic Acids. 2019 Jun 7;16:463-470. doi: 10.1016/j.omtn.2019.03.010. Epub 2019 Apr 11.

MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters.MULTiPly：一种用于发现通用和特定类型启动子的新型多层预测器。

Bioinformatics. 2019 Sep 1;35(17):2957-2965. doi: 10.1093/bioinformatics/btz016.

本文引用的文献

Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC.通过将二肽组成纳入周的通用 PseAAC，鉴定热休克蛋白家族和 J 蛋白类型。

Comput Methods Programs Biomed. 2015 Nov;122(2):165-74. doi: 10.1016/j.cmpb.2015.07.005. Epub 2015 Jul 22.

Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences.伪核苷酸组成或PseKNC：一种用于分析基因组序列的有效方法。

Mol Biosyst. 2015 Oct;11(10):2620-34. doi: 10.1039/c5mb00155b.

repRNA: a web server for generating various feature vectors of RNA sequences.repRNA：一个用于生成RNA序列各种特征向量的网络服务器。

Mol Genet Genomics. 2016 Feb;291(1):473-81. doi: 10.1007/s00438-015-1078-7. Epub 2015 Jun 18.

iCataly-PseAAC: Identification of Enzymes Catalytic Sites Using Sequence Evolution Information with Grey Model GM (2,1).iCataly-PseAAC：基于灰色模型GM(2,1)利用序列进化信息识别酶的催化位点

J Membr Biol. 2015 Dec;248(6):1033-41. doi: 10.1007/s00232-015-9815-8. Epub 2015 Jun 16.

TargetFreeze: Identifying Antifreeze Proteins via a Combination of Weights using Sequence Evolutionary Information and Pseudo Amino Acid Composition.TargetFreeze：通过结合使用序列进化信息和伪氨基酸组成的权重来鉴定抗冻蛋白

J Membr Biol. 2015 Dec;248(6):1005-14. doi: 10.1007/s00232-015-9811-z. Epub 2015 Jun 10.

PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine.PSOFuzzySVM-TMH：通过合并模糊支持向量机利用集成特征空间识别跨膜螺旋片段

Mol Biosyst. 2015 Aug;11(8):2255-62. doi: 10.1039/c5mb00196j.

Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One：一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。

Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.

iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC.iPPI-Esml：一种通过将蛋白质的物理化学性质和小波变换纳入伪氨基酸组成来识别蛋白质相互作用的集成分类器。

J Theor Biol. 2015 Jul 21;377:47-56. doi: 10.1016/j.jtbi.2015.04.011. Epub 2015 Apr 20.

Identification of real microRNA precursors with a pseudo structure status composition approach.采用伪结构状态组成方法鉴定真实的微小RNA前体。

PLoS One. 2015 Mar 30;10(3):e0121501. doi: 10.1371/journal.pone.0121501. eCollection 2015.

iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach.iMiRNA-PseDPC：基于伪距离对组合方法的 microRNA 前体识别。

J Biomol Struct Dyn. 2016;34(1):223-35. doi: 10.1080/07391102.2015.1014422. Epub 2015 Mar 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

iRSpot-GAEnsC：通过集成分类器识别重组位点并扩展周氏伪氨基酸组成概念以构建DNA样本

iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献