IBISC, Université Paris-Saclay (Univ. Evry), 23 boulevard de France, 91034, Evry, France.
BMC Bioinformatics. 2022 Jul 3;23(1):262. doi: 10.1186/s12859-022-04807-7.
Machine learning is now a standard tool for cancer prediction based on gene expression data. However, deep learning is still new for this task, and there is no clear consensus about its performance and utility. Few experimental works have evaluated deep neural networks and compared them with state-of-the-art machine learning. Moreover, their conclusions are not consistent.
We extensively evaluate the deep learning approach on 22 cancer prediction tasks based on gene expression data. We measure the impact of the main hyper-parameters and compare the performances of neural networks with the state-of-the-art. We also investigate the effectiveness of several transfer learning schemes in different experimental setups.
Based on our experimentations, we provide several recommendations to optimize the construction and training of a neural network model. We show that neural networks outperform the state-of-the-art methods only for very large training set size. For a small training set, we show that transfer learning is possible and may strongly improve the model performance in some cases.
机器学习现在是基于基因表达数据进行癌症预测的标准工具。然而,深度学习在这项任务中仍然是新的,关于它的性能和实用性还没有明确的共识。很少有实验工作评估了深度神经网络,并将其与最先进的机器学习进行了比较。此外,他们的结论并不一致。
我们在基于基因表达数据的 22 个癌症预测任务上广泛评估了深度学习方法。我们测量了主要超参数的影响,并比较了神经网络与最先进方法的性能。我们还研究了几种迁移学习方案在不同实验设置下的有效性。
根据我们的实验,我们提供了一些建议来优化神经网络模型的构建和训练。我们表明,神经网络仅在非常大的训练集大小下才能优于最先进的方法。对于小的训练集,我们表明迁移学习是可能的,并且在某些情况下可能会极大地提高模型性能。