Suppr超能文献

校准基于Transformer的模型对社区参与研究的置信度:决策支持评估研究

Calibrating a Transformer-Based Model's Confidence on Community-Engaged Research Studies: Decision Support Evaluation Study.

作者信息

Ferrell Brian, Raskin Sarah E, Zimmerman Emily B

机构信息

Virginia Commonwealth University, Richmond, VA, United States.

L. Douglas Wilder School of Government and Public Affairs, Virginia Commonwealth University, Richmond, VA, United States.

出版信息

JMIR Form Res. 2023 Mar 20;7:e41516. doi: 10.2196/41516.

Abstract

BACKGROUND

Deep learning offers great benefits in classification tasks such as medical imaging diagnostics or stock trading, especially when compared with human-level performances, and can be a viable option for classifying distinct levels within community-engaged research (CEnR). CEnR is a collaborative approach between academics and community partners with the aim of conducting research that is relevant to community needs while incorporating diverse forms of expertise. In the field of deep learning and artificial intelligence (AI), training multiple models to obtain the highest validation accuracy is common practice; however, it can overfit toward that specific data set and not generalize well to a real-world population, which creates issues of bias and potentially dangerous algorithmic decisions. Consequently, if we plan on automating human decision-making, there is a need for creating techniques and exhaustive evaluative processes for these powerful unexplainable models to ensure that we do not incorporate and blindly trust poor AI models to make real-world decisions.

OBJECTIVE

We aimed to conduct an evaluation study to see whether our most accurate transformer-based models derived from previous studies could emulate our own classification spectrum for tracking CEnR studies as well as whether the use of calibrated confidence scores was meaningful.

METHODS

We compared the results from 3 domain experts, who classified a sample of 45 studies derived from our university's institutional review board database, with those from 3 previously trained transformer-based models, as well as investigated whether calibrated confidence scores can be a viable technique for using AI in a support role for complex decision-making systems.

RESULTS

Our findings reveal that certain models exhibit an overestimation of their performance through high confidence scores, despite not achieving the highest validation accuracy.

CONCLUSIONS

Future studies should be conducted with larger sample sizes to generalize the results more effectively. Although our study addresses the concerns of bias and overfitting in deep learning models, there is a need to further explore methods that allow domain experts to trust our models more. The use of a calibrated confidence score can be a misleading metric when determining our AI model's level of competency.

摘要

背景

深度学习在诸如医学影像诊断或股票交易等分类任务中带来了巨大益处,尤其是与人类水平的表现相比时,并且对于在社区参与研究(CEnR)中对不同级别进行分类而言可能是一个可行的选择。CEnR是学者与社区伙伴之间的一种协作方法,旨在开展与社区需求相关的研究,同时纳入多种形式的专业知识。在深度学习和人工智能(AI)领域,训练多个模型以获得最高的验证准确率是常见做法;然而,这可能会过度拟合该特定数据集,而不能很好地推广到实际人群,从而产生偏差问题以及潜在的危险算法决策。因此,如果我们计划实现人类决策自动化,就需要为这些强大的难以解释的模型创建技术和详尽的评估过程,以确保我们不会采用并盲目信任不良的AI模型来做出实际决策。

目的

我们旨在进行一项评估研究,以查看我们从先前研究中得出的最准确的基于Transformer的模型是否能够模拟我们自己用于跟踪CEnR研究的分类谱,以及使用校准后的置信度分数是否有意义。

方法

我们将3位领域专家对从我们大学机构审查委员会数据库中抽取的45项研究样本进行分类的结果,与3个先前训练的基于Transformer的模型的结果进行了比较,并研究了校准后的置信度分数是否可以作为一种可行的技术,用于在复杂决策系统中为AI提供支持作用。

结果

我们的研究结果表明,某些模型尽管未达到最高的验证准确率,但通过高置信度分数高估了它们的性能。

结论

未来的研究应该使用更大的样本量来更有效地推广结果。尽管我们的研究解决了深度学习模型中的偏差和过度拟合问题,但仍需要进一步探索使领域专家更信任我们模型的方法。在校定我们的AI模型的能力水平时,使用校准后的置信度分数可能是一个误导性指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5227/10131979/ec7cd5377959/formative_v7i1e41516_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验