Department of Computer Science, Harvard University, 29 Oxford Street, Cambridge, MA, 02138, USA.
Center for Quantitative Health, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114, USA.
Transl Psychiatry. 2021 Feb 4;11(1):108. doi: 10.1038/s41398-021-01224-x.
Decision support systems embodying machine learning models offer the promise of an improved standard of care for major depressive disorder, but little is known about how clinicians' treatment decisions will be influenced by machine learning recommendations and explanations. We used a within-subject factorial experiment to present 220 clinicians with patient vignettes, each with or without a machine-learning (ML) recommendation and one of the multiple forms of explanation. We found that interacting with ML recommendations did not significantly improve clinicians' treatment selection accuracy, assessed as concordance with expert psychopharmacologist consensus, compared to baseline scenarios in which clinicians made treatment decisions independently. Interacting with incorrect recommendations paired with explanations that included limited but easily interpretable information did lead to a significant reduction in treatment selection accuracy compared to baseline questions. These results suggest that incorrect ML recommendations may adversely impact clinician treatment selections and that explanations are insufficient for addressing overreliance on imperfect ML algorithms. More generally, our findings challenge the common assumption that clinicians interacting with ML tools will perform better than either clinicians or ML algorithms individually.
决策支持系统体现了机器学习模型,为改善重度抑郁症的护理标准提供了希望,但对于临床医生的治疗决策将如何受到机器学习推荐和解释的影响知之甚少。我们使用了一项被试内因子实验,向 220 名临床医生展示了患者的案例,每个案例都有或没有机器学习(ML)推荐,以及多种解释形式之一。我们发现,与临床医生独立做出治疗决策的基线情况相比,与 ML 推荐进行交互并没有显著提高临床医生的治疗选择准确性,评估标准为与专家精神药理学家共识的一致性。与基线问题相比,与包含有限但易于理解的信息的错误推荐进行交互确实导致了治疗选择准确性的显著降低。这些结果表明,不正确的 ML 推荐可能会对临床医生的治疗选择产生不利影响,并且解释不足以解决对不完美的 ML 算法的过度依赖。更普遍地说,我们的研究结果挑战了一个常见的假设,即与 ML 工具交互的临床医生的表现将优于临床医生或 ML 算法各自的表现。