Heyard Rachel, Pina David G, Buljan Ivan, Marušić Ana
Center for Reproducible Science, Epidemiology, Biostatistic and Prevention Institute, University of Zurich, Zurich, Switzerland.
European Research Executive Agency, European Commission, Brussels, Belgium.
PLoS One. 2025 Mar 24;20(3):e0317772. doi: 10.1371/journal.pone.0317772. eCollection 2025.
Funding agencies rely on panel or consensus meetings to summarise individual evaluations of grant proposals into a final ranking. However, previous research has shown inconsistency in decisions and inefficiency of consensus meetings. Using data from the Marie Skłodowska-Curie Actions, we aimed at investigating the differences between an algorithmic approach to summarise the information from grant proposal individual evaluations to decisions after consensus meetings, and we present an exploratory comparative analysis. The algorithmic approach employed was a Bayesian hierarchical model resulting in a Bayesian ranking of the proposals using the individual evaluation reports cast prior to the consensus meeting. Parameters from the Bayesian hierarchical model and the subsequent ranking were compared to the scores, ranking and decisions established in the consensus meeting reports. The results from the evaluation of 1,006 proposals submitted to three panels (Life Science, Mathematics, Social Sciences and Humanities) in two call years (2015 and 2019) were investigated in detail. Overall, we found large discrepancies between the consensus reports and the scores a Bayesian hierarchical model would have predicted. The discrepancies were less pronounced when the scores were aggregated into funding rankings or decisions. The best agreement between the final funding ranking can be observed in the case of funding schemes with very low success rates. While we set out to understand if algorithmic approaches, with the aim of summarising individual evaluation scores, could replace consensus meetings, we concluded that currently individual scores assigned prior to the consensus meetings are not useful to predict the final funding outcomes of the proposals. Following our results, we would suggest to use individual evaluations for a triage and subsequently not discuss the weakest proposals in panel or consensus meetings. This would allow a more nuanced evaluation of a smaller set of proposals and help minimise the uncertainty and biases when allocating funding.
资助机构依靠专家小组会议或共识会议,将对资助申请的个人评估汇总为最终排名。然而,先前的研究表明,决策存在不一致性,共识会议效率低下。利用玛丽·居里行动计划的数据,我们旨在研究一种算法方法与共识会议后从资助申请个人评估中汇总信息到决策之间的差异,并进行探索性比较分析。所采用的算法方法是贝叶斯分层模型,该模型使用共识会议之前提交的个人评估报告对申请进行贝叶斯排名。将贝叶斯分层模型的参数和随后的排名与共识会议报告中确定的分数、排名和决策进行比较。详细研究了在两个征集年份(2015年和2019年)提交给三个专家小组(生命科学、数学、社会科学和人文科学)的1006份申请的评估结果。总体而言,我们发现共识报告与贝叶斯分层模型预测的分数之间存在很大差异。当分数汇总为资助排名或决策时,差异不太明显。在成功率非常低的资助计划中,可以观察到最终资助排名之间的最佳一致性。虽然我们着手了解旨在汇总个人评估分数的算法方法是否可以取代共识会议,但我们得出的结论是,目前在共识会议之前分配的个人分数对于预测申请的最终资助结果并无用处。根据我们的结果,我们建议使用个人评估进行筛选,随后不在专家小组会议或共识会议中讨论最薄弱的申请。这将允许对较小的一组申请进行更细致入微的评估,并有助于在分配资金时将不确定性和偏差降至最低。