Suppr超能文献

一种从序列数据中生成特征的进化算法方法及其在 DNA 剪接位点预测中的应用。

An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction.

机构信息

Department of Computer Science, George Mason University, Ashburn, VA 20147, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Sep-Oct;9(5):1387-98. doi: 10.1109/TCBB.2012.53.

Abstract

Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain experts or exhaustive feature enumeration techniques to generate a few features whose predictive power is then tested in the context of classification. This paper proposes an evolutionary algorithm to effectively explore a large feature space and generate predictive features from sequence data. The effectiveness of the algorithm is demonstrated on an important component of the gene-finding problem, DNA splice site prediction. This application is chosen due to the complexity of the features needed to obtain high classification accuracy and precision. Our results test the effectiveness of the obtained features in the context of classification by Support Vector Machines and show significant improvement in accuracy and precision over state-of-the-art approaches.

摘要

将功能信息与生物序列相关联仍然是机器学习方法面临的挑战。这些方法的性能通常取决于从要分类的序列中推导出预测特征。特征生成是一个困难的问题,因为序列特征与所寻求的属性之间的联系是事先不知道的。通常是领域专家或详尽的特征枚举技术的任务,以生成一些特征,然后在分类的背景下测试其预测能力。本文提出了一种进化算法,有效地探索了一个大的特征空间,并从序列数据中生成预测特征。该算法在基因定位问题的一个重要组成部分——DNA 剪接位点预测中的有效性得到了证明。之所以选择这个应用程序,是因为要获得高精度和高准确率,需要用到复杂的特征。我们的结果通过支持向量机在分类上下文中测试了所得到的特征的有效性,并显示出在准确性和精度方面比最先进的方法有显著的提高。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验