Nguyen Jean-Michel, Jézéquel Pascal, Gillois Pierre, Silva Luisa, Ben Azzouz Faouda, Lambert-Lacroix Sophie, Juin Philippe, Campone Mario, Gaultier Aurélie, Moreau-Gaudry Alexandre, Antonioli Daniel
Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques, Applications (TIMC-IMAG) -UMR 5525, Université Grenoble Alpes-CNRS, France.
CRCINA - INCIT Department - Team 2 - 8, quai Moncousu - BP 70721 - 44007 Nantes cedex 1 , France.
Bioinformatics. 2021 Aug 9;37(15):2165-2174. doi: 10.1093/bioinformatics/btab074.
The principle of Breiman's random forest (RF) is to build and assemble complementary classification trees in a way that maximizes their variability. We propose a new type of random forest that disobeys Breiman's principles and involves building trees with no classification errors in very large quantities. We used a new type of decision tree that uses a neuron at each node as well as an in-innovative half Christmas tree structure. With these new RFs, we developed a score, based on a family of ten new statistical information criteria, called Nguyen information criteria (NICs), to evaluate the predictive qualities of features in three dimensions.
The first NIC allowed the Akaike information criterion to be minimized more quickly than data obtained with the Gini index when the features were introduced in a logistic regression model. The selected features based on the NICScore showed a slight advantage compared to the support vector machines-recursive feature elimination (SVM-RFE) method. We demonstrate that the inclusion of artificial neurons in tree nodes allows a large number of classifiers in the same node to be taken into account simultaneously and results in perfect trees without classification errors.
The methods used to build the perfect trees in this article were implemented in the 'ROP' R package, archived at https://cran.r-project.org/web/packages/ROP/index.html.
Supplementary data are available at Bioinformatics online.
布莱曼随机森林(RF)的原理是以最大化其变异性的方式构建和组装互补分类树。我们提出了一种新型随机森林,它违背了布莱曼原理,涉及构建大量无分类错误的树。我们使用了一种新型决策树,该决策树在每个节点使用一个神经元以及一种创新的半圣诞树结构。利用这些新型随机森林,我们基于十个新的统计信息准则家族开发了一个分数,称为阮氏信息准则(NICs),以在三个维度上评估特征的预测质量。
当在逻辑回归模型中引入特征时,第一个NIC比使用基尼指数获得的数据更快地使赤池信息准则最小化。基于NICScore选择的特征与支持向量机递归特征消除(SVM - RFE)方法相比显示出轻微优势。我们证明在树节点中包含人工神经元允许同时考虑同一节点中的大量分类器,并产生无分类错误的完美树。
本文中用于构建完美树的方法在“ROP”R包中实现,存档于https://cran.r-project.org/web/packages/ROP/index.html。
补充数据可在《生物信息学》在线获取。