Department of Computer and Information Science, University of Mississippi, University, MS 38677, USA.
IEEE Trans Nanobioscience. 2012 Sep;11(3):228-36. doi: 10.1109/TNB.2012.2213264.
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
有大量涉及多种数据源组合的生物学相关研究问题,以更好地理解潜在问题。从这些来源中选择和解释最重要的信息非常重要。因此,拥有一个能够同时提取规则和选择特征的好算法,将有助于更好地解释预测模型。我们提出了一种基于 1-范数正则化随机森林的高效算法,即联合规则提取和特征消除(CRF)。CRF 同时提取随机森林生成的少量规则并选择重要特征。我们将 CRF 应用于几个药物活性预测和微阵列数据集。CRF 能够使用少量决策规则产生与最先进的预测算法相当的性能。一些决策规则具有生物学意义。