Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
Department of Bioinformatics, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran.
Sci Rep. 2021 Feb 8;11(1):3349. doi: 10.1038/s41598-021-82796-y.
Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques. It also plays a critical role in different biological applications such as the identification of biomarkers. Although many feature/gene selection algorithms and methods have been introduced, they may suffer from problems such as parameter tuning or low level of performance. To tackle such limitations, in this study, a universal wrapper approach is introduced based on our introduced optimization algorithm and the genetic algorithm (GA). In the proposed approach, candidate solutions have variable lengths, and a support vector machine scores them. To show the usefulness of the method, thirteen classification and regression-based datasets with different properties were chosen from various biological scopes, including drug discovery, cancer diagnostics, clinical applications, etc. Our findings confirmed that the proposed method outperforms most of the other currently used approaches and can also free the users from difficulties related to the tuning of various parameters. As a result, users may optimize their biological applications such as obtaining a biomarker diagnostic kit with the minimum number of genes and maximum separability power.
基因/特征选择是使用机器学习技术创建模型的必要预处理步骤。它在不同的生物应用中也起着关键作用,如生物标志物的识别。尽管已经引入了许多特征/基因选择算法和方法,但它们可能存在参数调整或性能水平低等问题。为了解决这些限制,本研究基于我们引入的优化算法和遗传算法(GA),提出了一种通用的封装方法。在提出的方法中,候选解决方案的长度可变,支持向量机对其进行评分。为了展示该方法的有用性,从不同的生物学领域选择了 13 个具有不同特性的分类和回归数据集,包括药物发现、癌症诊断、临床应用等。我们的研究结果证实,该方法优于大多数其他当前使用的方法,还可以使用户摆脱与调整各种参数相关的困难。因此,用户可以优化他们的生物应用,例如用最小数量的基因和最大的可分离性获得生物标志物诊断试剂盒。