Gu Lin, Zhang Xiaowei, You Shaodi, Zhao Shen, Liu Zhenzhong, Harada Tatsuya
RIKEN AIP, Tokyo, Japan.
Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Tokyo, Japan.
Front Neuroinform. 2020 Nov 10;14:601829. doi: 10.3389/fninf.2020.601829. eCollection 2020.
One major challenge in medical imaging analysis is the lack of label and annotation which usually requires medical knowledge and training. This issue is particularly serious in the brain image analysis such as the analysis of retinal vasculature, which directly reflects the vascular condition of Central Nervous System (CNS). In this paper, we present a novel semi-supervised learning algorithm to boost the performance of random forest under limited labeled data by exploiting the local structure of unlabeled data. We identify the key bottleneck of random forest to be the information gain calculation and replace it with a graph-embedded entropy which is more reliable for insufficient labeled data scenario. By properly modifying the training process of standard random forest, our algorithm significantly improves the performance while preserving the virtue of random forest such as low computational burden and robustness over over-fitting. Our method has shown a superior performance on both medical imaging analysis and machine learning benchmarks.
医学影像分析中的一个主要挑战是缺乏标签和注释,这通常需要医学知识和培训。这个问题在脑图像分析中尤为严重,例如视网膜血管分析,它直接反映了中枢神经系统(CNS)的血管状况。在本文中,我们提出了一种新颖的半监督学习算法,通过利用未标记数据的局部结构来提高有限标记数据下随机森林的性能。我们确定随机森林的关键瓶颈在于信息增益计算,并用一种对标记数据不足场景更可靠的图嵌入熵来取代它。通过适当修改标准随机森林的训练过程,我们的算法在保持随机森林诸如低计算负担和抗过拟合鲁棒性等优点的同时,显著提高了性能。我们的方法在医学影像分析和机器学习基准测试中都表现出了卓越的性能。