Suppr超能文献

一种基于遗传算法和世界竞争竞赛算法的机器学习方法,用于在生物应用中选择基因或特征。

A machine learning method based on the genetic and world competitive contests algorithms for selecting genes or features in biological applications.

机构信息

Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.

Department of Bioinformatics, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran.

出版信息

Sci Rep. 2021 Feb 8;11(1):3349. doi: 10.1038/s41598-021-82796-y.

Abstract

Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques. It also plays a critical role in different biological applications such as the identification of biomarkers. Although many feature/gene selection algorithms and methods have been introduced, they may suffer from problems such as parameter tuning or low level of performance. To tackle such limitations, in this study, a universal wrapper approach is introduced based on our introduced optimization algorithm and the genetic algorithm (GA). In the proposed approach, candidate solutions have variable lengths, and a support vector machine scores them. To show the usefulness of the method, thirteen classification and regression-based datasets with different properties were chosen from various biological scopes, including drug discovery, cancer diagnostics, clinical applications, etc. Our findings confirmed that the proposed method outperforms most of the other currently used approaches and can also free the users from difficulties related to the tuning of various parameters. As a result, users may optimize their biological applications such as obtaining a biomarker diagnostic kit with the minimum number of genes and maximum separability power.

摘要

基因/特征选择是使用机器学习技术创建模型的必要预处理步骤。它在不同的生物应用中也起着关键作用,如生物标志物的识别。尽管已经引入了许多特征/基因选择算法和方法,但它们可能存在参数调整或性能水平低等问题。为了解决这些限制,本研究基于我们引入的优化算法和遗传算法(GA),提出了一种通用的封装方法。在提出的方法中,候选解决方案的长度可变,支持向量机对其进行评分。为了展示该方法的有用性,从不同的生物学领域选择了 13 个具有不同特性的分类和回归数据集,包括药物发现、癌症诊断、临床应用等。我们的研究结果证实,该方法优于大多数其他当前使用的方法,还可以使用户摆脱与调整各种参数相关的困难。因此,用户可以优化他们的生物应用,例如用最小数量的基因和最大的可分离性获得生物标志物诊断试剂盒。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/57cd/7870651/936cf1ff97b2/41598_2021_82796_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验