一种从序列数据中生成特征的进化算法方法及其在 DNA 剪接位点预测中的应用。

An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction.

机构信息

Department of Computer Science, George Mason University, Ashburn, VA 20147, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1387-98. doi: 10.1109/TCBB.2012.53.

DOI:10.1109/TCBB.2012.53

Abstract

Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.

摘要

将功能信息与生物序列相关联仍然是机器学习方法面临的挑战。这些方法的性能通常取决于从要分类的序列中推导出预测特征。特征生成是一个困难的问题，因为序列特征与所寻求的属性之间的联系是事先不知道的。通常是领域专家或详尽的特征枚举技术的任务，以生成一些特征，然后在分类的背景下测试其预测能力。本文提出了一种进化算法，有效地探索了一个大的特征空间，并从序列数据中生成预测特征。该算法在基因定位问题的一个重要组成部分——DNA 剪接位点预测中的有效性得到了证明。之所以选择这个应用程序，是因为要获得高精度和高准确率，需要用到复杂的特征。我们的结果通过支持向量机在分类上下文中测试了所得到的特征的有效性，并显示出在准确性和精度方面比最先进的方法有显著的提高。

相似文献

An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction.一种从序列数据中生成特征的进化算法方法及其在 DNA 剪接位点预测中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1387-98. doi: 10.1109/TCBB.2012.53.

Fast splice site detection using information content and feature reduction.利用信息内容和特征约简进行快速剪接位点检测。

BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S8. doi: 10.1186/1471-2105-9-S12-S8.

SpliceIT: a hybrid method for splice signal identification based on probabilistic and biological inference.SpliceIT：一种基于概率和生物推理的混合剪接信号识别方法。

J Biomed Inform. 2010 Apr;43(2):208-17. doi: 10.1016/j.jbi.2009.09.004. Epub 2009 Sep 30.

An improved heuristic algorithm for finding motif signals in DNA sequences.一种改进的启发式算法，用于在 DNA 序列中寻找基序信号。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):959-75. doi: 10.1109/TCBB.2010.92.

EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences.EMD：一种用于在DNA序列中发现调控基序的集成算法。

BMC Bioinformatics. 2006 Jul 13;7:342. doi: 10.1186/1471-2105-7-342.

Feature subset selection for splice site prediction.用于剪接位点预测的特征子集选择

Bioinformatics. 2002;18 Suppl 2:S75-83. doi: 10.1093/bioinformatics/18.suppl_2.s75.

SplicePort--an interactive splice-site analysis tool.SplicePort——一种交互式剪接位点分析工具。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W285-91. doi: 10.1093/nar/gkm407. Epub 2007 Jun 18.

A novel approach for accurate identification of splice junctions based on hybrid algorithms.一种基于混合算法准确识别剪接位点的新方法。

J Biomol Struct Dyn. 2015;33(6):1281-90. doi: 10.1080/07391102.2014.944218. Epub 2014 Sep 9.

Effective automated feature construction and selection for classification of biological sequences.用于生物序列分类的有效自动特征构建与选择

PLoS One. 2014 Jul 17;9(7):e99982. doi: 10.1371/journal.pone.0099982. eCollection 2014.

HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.HMMBinder：基于 HMM -profile 特征的 DNA 结合蛋白预测。

Biomed Res Int. 2017;2017:4590609. doi: 10.1155/2017/4590609. Epub 2017 Nov 14.

引用本文的文献

An automated framework for evaluation of deep learning models for splice site predictions.用于评估深度学习模型进行剪接位点预测的自动化框架。

Sci Rep. 2023 Jun 23;13(1):10221. doi: 10.1038/s41598-023-34795-4.

SNPlice: variants that modulate Intron retention from RNA-sequencing data.SNPlice：从RNA测序数据中调控内含子保留的变异体。

Bioinformatics. 2015 Apr 15;31(8):1191-8. doi: 10.1093/bioinformatics/btu804. Epub 2014 Dec 6.

Effective automated feature construction and selection for classification of biological sequences.用于生物序列分类的有效自动特征构建与选择

PLoS One. 2014 Jul 17;9(7):e99982. doi: 10.1371/journal.pone.0099982. eCollection 2014.

A survey on evolutionary algorithm based hybrid intelligence in bioinformatics.基于进化算法的生物信息学混合智能研究综述。

Biomed Res Int. 2014;2014:362738. doi: 10.1155/2014/362738. Epub 2014 Mar 6.

Achieving high accuracy prediction of minimotifs.实现最小基序的高精度预测。

PLoS One. 2012;7(9):e45589. doi: 10.1371/journal.pone.0045589. Epub 2012 Sep 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种从序列数据中生成特征的进化算法方法及其在 DNA 剪接位点预测中的应用。

An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献