College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.
Sci Rep. 2017 Oct 12;7(1):13013. doi: 10.1038/s41598-017-13259-6.
The advent of big data era has imposed both running time and learning efficiency challenges for the machine learning researchers. Biomedical OMIC research is one of these big data areas and has changed the biomedical research drastically. But the high cost of data production and difficulty in participant recruitment introduce the paradigm of "large p small n" into the biomedical research. Feature selection is usually employed to reduce the high number of biomedical features, so that a stable data-independent classification or regression model may be achieved. This study randomly changes the first element of the widely-used incremental feature selection (IFS) strategy and selects the best feature subset that may be ranked low by the statistical association evaluation algorithms, e.g. t-test. The hypothesis is that two low-ranked features may be orchestrated to achieve a good classification performance. The proposed Randomly re-started Incremental Feature Selection (RIFS) algorithm demonstrates both higher classification accuracy and smaller feature number than the existing algorithms. RIFS also outperforms the existing methylomic diagnosis model for the prostate malignancy with a larger accuracy and a lower number of transcriptomic features.
大数据时代的到来给机器学习研究人员带来了运行时间和学习效率方面的挑战。生物医学 OMIC 研究是这些大数据领域之一,它彻底改变了生物医学研究。但是,数据产生的高成本和参与者招募的困难将“大 p 小 n”范式引入了生物医学研究。特征选择通常用于减少大量的生物医学特征,从而可以实现稳定的数据独立分类或回归模型。本研究随机改变了增量特征选择(IFS)策略中广泛使用的第一个元素,并选择了可能被统计关联评估算法(例如 t 检验)排名较低的最佳特征子集。其假设是两个排名较低的特征可能会协调以实现良好的分类性能。与现有的算法相比,所提出的随机重新启动增量特征选择(RIFS)算法不仅具有更高的分类准确性,而且特征数量也更少。RIFS 还在转录组特征数量更少的情况下,在前列腺恶性肿瘤的甲基组学诊断模型方面的表现优于现有模型。