Stumpf Michael P H
School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia.
Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
J R Soc Interface. 2020 Oct;17(171):20200419. doi: 10.1098/rsif.2020.0419. Epub 2020 Oct 21.
Recent progress in theoretical systems biology, applied mathematics and computational statistics allows us to compare the performance of different candidate models at describing a particular biological system quantitatively. Model selection has been applied with great success to problems where a small number-typically less than 10-of models are compared, but recent studies have started to consider thousands and even millions of candidate models. Often, however, we are left with sets of models that are compatible with the data, and then we can use ensembles of models to make predictions. These ensembles can have very desirable characteristics, but as I show here are not guaranteed to improve on individual estimators or predictors. I will show in the cases of model selection and network inference when we can trust ensembles, and when we should be cautious. The analyses suggest that the careful construction of an ensemble-choosing good predictors-is of paramount importance, more than had perhaps been realized before: merely adding different methods does not suffice. The success of ensemble network inference methods is also shown to rest on their ability to suppress false-positive results. A Jupyter notebook which allows carrying out an assessment of ensemble estimators is provided.
理论系统生物学、应用数学和计算统计学的最新进展使我们能够定量比较不同候选模型在描述特定生物系统方面的性能。模型选择已成功应用于比较少量(通常少于10个)模型的问题,但最近的研究已开始考虑数千甚至数百万个候选模型。然而,我们常常会得到与数据兼容的模型集,然后我们可以使用模型集成来进行预测。这些集成可以具有非常理想的特性,但正如我在此所示,它们并不能保证比单个估计器或预测器有所改进。我将在模型选择和网络推断的案例中说明何时我们可以信任集成,以及何时我们应该谨慎。分析表明,精心构建一个集成——选择好的预测器——至关重要,其重要性可能比之前意识到的还要高:仅仅添加不同的方法是不够的。集成网络推断方法的成功还表明取决于它们抑制假阳性结果的能力。本文提供了一个Jupyter笔记本,用于对集成估计器进行评估。