Institute for Systems and Robotics, LARSyS, Instituto Superior Técnico, Lisbon, Portugal.
Dermatology Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Nat Med. 2023 Aug;29(8):1941-1946. doi: 10.1038/s41591-023-02475-5. Epub 2023 Jul 27.
We investigated whether human preferences hold the potential to improve diagnostic artificial intelligence (AI)-based decision support using skin cancer diagnosis as a use case. We utilized nonuniform rewards and penalties based on expert-generated tables, balancing the benefits and harms of various diagnostic errors, which were applied using reinforcement learning. Compared with supervised learning, the reinforcement learning model improved the sensitivity for melanoma from 61.4% to 79.5% (95% confidence interval (CI): 73.5-85.6%) and for basal cell carcinoma from 79.4% to 87.1% (95% CI: 80.3-93.9%). AI overconfidence was also reduced while simultaneously maintaining accuracy. Reinforcement learning increased the rate of correct diagnoses made by dermatologists by 12.0% (95% CI: 8.8-15.1%) and improved the rate of optimal management decisions from 57.4% to 65.3% (95% CI: 61.7-68.9%). We further demonstrated that the reward-adjusted reinforcement learning model and a threshold-based model outperformed naïve supervised learning in various clinical scenarios. Our findings suggest the potential for incorporating human preferences into image-based diagnostic algorithms.
我们研究了人类偏好是否有可能通过以皮肤癌诊断为用例来改进基于人工智能(AI)的诊断决策支持。我们使用强化学习,根据专家生成的表格使用非均匀奖励和惩罚,平衡各种诊断错误的益处和危害。与监督学习相比,强化学习模型将黑色素瘤的敏感性从 61.4%提高到 79.5%(95%置信区间(CI):73.5-85.6%),基底细胞癌的敏感性从 79.4%提高到 87.1%(95%CI:80.3-93.9%)。同时,AI 过度自信也得到了降低,而准确率保持不变。强化学习将皮肤科医生正确诊断的比率提高了 12.0%(95%CI:8.8-15.1%),将最佳管理决策的比率从 57.4%提高到 65.3%(95%CI:61.7-68.9%)。我们进一步证明,奖励调整后的强化学习模型和基于阈值的模型在各种临床情况下均优于简单的监督学习。我们的研究结果表明,将人类偏好纳入基于图像的诊断算法具有潜力。