使用逆强化学习生成奖励函数以实现个性化癌症筛查。

Generating Reward Functions Using IRL Towards Individualized Cancer Screening.

作者信息

Petousis Panayiotis, Han Simon X, Hsu William, Bui Alex A T

机构信息

UCLA Bioengineering Department, Los Angeles, CA 90095, USA.

UCLA Department of Radiological Sciences, Los Angeles, CA 90095, USA.

出版信息

Artif Intell Health (2018). 2019;11326:213-227. doi: 10.1007/978-3-030-12738-1_16. Epub 2019 Feb 21.

DOI:10.1007/978-3-030-12738-1_16

PMID:31363717

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6667225/

Abstract

Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen's Kappa score of agreement between the POMDPs and physicians' predictions was high in breast cancer and had a decreasing trend in lung cancer.

摘要

癌症筛查可受益于能减少过度诊断的个性化决策工具。癌症筛查参与者的异质性表明需要更个性化的方法。部分可观测马尔可夫决策过程（POMDP），若定义了合适的奖励函数，可用于提出最优的个性化筛查策略。然而，确定合适的奖励函数可能具有挑战性。在此，我们提议使用逆强化学习（IRL）为肺癌和乳腺癌筛查POMDP形成奖励函数。利用专家（医生）针对肺癌和乳腺癌筛查的回顾性筛查决策，我们开发了两个带有相应奖励函数的POMDP模型。具体而言，采用具有自适应步长的最大熵（MaxEnt）IRL算法更高效地学习奖励；并结合乘法模型为POMDP学习状态 - 行动对奖励。基于POMDP筛查模型在癌症诊断前推荐合适筛查决策的能力对其进行评估。通过MaxEnt IRL算法学习的奖励函数，与肺癌和乳腺癌筛查中的POMDP模型相结合时，表现与专家相当。POMDP与医生预测之间的一致性Cohen's Kappa评分在乳腺癌中较高，在肺癌中呈下降趋势。

相似文献

Generating Reward Functions Using IRL Towards Individualized Cancer Screening.

Artif Intell Health (2018). 2019;11326:213-227. doi: 10.1007/978-3-030-12738-1_16. Epub 2019 Feb 21.

Prescription of Controlled Substances: Benefits and Risks

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

Can We Enhance Shared Decision-making for Periacetabular Osteotomy Surgery? A Qualitative Study of Patient Experiences.

Clin Orthop Relat Res. 2025 Jan 1;483(1):120-136. doi: 10.1097/CORR.0000000000003198. Epub 2024 Jul 23.

Sexual Harassment and Prevention Training

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

Variation within and between digital pathology and light microscopy for the diagnosis of histopathology slides: blinded crossover comparison study.

Health Technol Assess. 2025 Jul;29(30):1-75. doi: 10.3310/SPLK4325.

Plug-and-play use of tree-based methods: consequences for clinical prediction modeling.

J Clin Epidemiol. 2025 Aug;184:111834. doi: 10.1016/j.jclinepi.2025.111834. Epub 2025 May 19.

引用本文的文献

Using Sequential Decision Making to Improve Lung Cancer Screening Performance.

IEEE Access. 2019;7:119403-119419. doi: 10.1109/ACCESS.2019.2935763. Epub 2019 Aug 16.

本文引用的文献

Prediction of lung cancer incidence on the low-dose computed tomography arm of the National Lung Screening Trial: A dynamic Bayesian network.

Artif Intell Med. 2016 Sep;72:42-55. doi: 10.1016/j.artmed.2016.07.001. Epub 2016 Jul 27.

Learning Bayesian networks for clinical time series analysis.

J Biomed Inform. 2014 Apr;48:94-105. doi: 10.1016/j.jbi.2013.12.007. Epub 2013 Dec 18.

The Athena Breast Health Network: developing a rapid learning system in breast cancer prevention, screening, treatment, and care.

Breast Cancer Res Treat. 2013 Jul;140(2):417-25. doi: 10.1007/s10549-013-2612-0. Epub 2013 Jul 26.

Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach.

Artif Intell Med. 2013 Jan;57(1):9-19. doi: 10.1016/j.artmed.2012.12.003. Epub 2012 Dec 31.

A dynamic Bayesian network for estimating the risk of falls from real gait data.

Med Biol Eng Comput. 2013 Feb;51(1-2):29-37. doi: 10.1007/s11517-012-0960-2. Epub 2012 Oct 14.

On the decision rules of cost-effective treatment for patients with diabetic foot syndrome.

Clinicoecon Outcomes Res. 2010;2:121-6. doi: 10.2147/CEOR.S11981. Epub 2010 Aug 17.

Reduced lung-cancer mortality with low-dose computed tomographic screening.

N Engl J Med. 2011 Aug 4;365(5):395-409. doi: 10.1056/NEJMoa1102873. Epub 2011 Jun 29.

Probabilistic computer model developed from clinical data in national mammography database format to classify mammographic findings.

Radiology. 2009 Jun;251(3):663-72. doi: 10.1148/radiol.2513081346. Epub 2009 Apr 14.

Evaluation of a dynamic bayesian belief network to predict osteoarthritic knee pain using data from the osteoarthritis initiative.

AMIA Annu Symp Proc. 2008 Nov 6;2008:788-92.

A model for optimal sequential decisions applied to liver transplantation.

Stud Health Technol Inform. 2000;77:758-62.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用逆强化学习生成奖励函数以实现个性化癌症筛查。

Generating Reward Functions Using IRL Towards Individualized Cancer Screening.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献