Sun Xiaotan, Nakashima Makiya, Nguyen Christopher, Chen Po-Hao, Tang W H Wilson, Kwon Deborah, Chen David
Cardiovascular Innovation Research Center, Cleveland Clinic, Cleveland, OH 44195, United States.
Heart Vascular and Thoracic Institute, Cleveland Clinic, Cleveland, OH 44195, United States.
J Am Med Inform Assoc. 2025 Aug 1;32(8):1299-1309. doi: 10.1093/jamia/ocaf095.
Fairness concerns stemming from known and unknown biases in healthcare practices have raised questions about the trustworthiness of Artificial Intelligence (AI)-driven Clinical Decision Support Systems (CDSS). Studies have shown unforeseen performance disparities in subpopulations when applied to clinical settings different from training. Existing unfairness mitigation strategies often struggle with scalability and accessibility, while their pursuit of group-level prediction performance parity does not effectively translate into fairness at the point of care. This study introduces FairICP, a flexible and cost-effective post-implementation framework based on Inductive Conformal Prediction (ICP), to provide users with actionable knowledge of model uncertainty due to subpopulation level biases at the point of care.
FairICP applies ICP to identify the model's scope of competence through group specific calibration, ensuring equitable prediction reliability by filtering predictions that fall within the trusted competence boundaries. We evaluated FairICP against four benchmarks on three medical imaging modalities: (1) Cardiac Magnetic Resonance Imaging (MRI), (2) Chest X-ray and (3) Dermatology Imaging, acquired from both private and large public datasets. Frameworks are assessed on prediction performance enhancement and unfairness mitigation capabilities.
Compared to the baseline, FairICP improved prediction accuracy by 7.2% and reduced the accuracy gap between the privileged and unprivileged subpopulations by 2.2% on average across all three datasets.
Our work provides a robust solution to promote trust and transparency in AI-CDSS, fostering equality and equity in healthcare for diverse patient populations. Such post-process methods are critical to enabling a robust framework for AI-CDSS implementation and monitoring for healthcare settings.
医疗实践中已知和未知偏差引发的公平性问题,引发了人们对人工智能(AI)驱动的临床决策支持系统(CDSS)可信度的质疑。研究表明,当应用于与训练不同的临床环境时,亚群体中会出现意外的性能差异。现有的不公平缓解策略往往在可扩展性和可及性方面面临困难,而它们对群体层面预测性能均等的追求并不能有效地转化为医疗点的公平性。本研究引入了FairICP,这是一个基于归纳共形预测(ICP)的灵活且经济高效的实施后框架,旨在为用户提供因医疗点亚群体层面偏差而导致的模型不确定性的可操作知识。
FairICP应用ICP通过特定群体校准来识别模型的能力范围,通过过滤落在可信能力边界内的预测来确保公平的预测可靠性。我们在三种医学成像模式的四个基准上评估了FairICP:(1)心脏磁共振成像(MRI),(2)胸部X光和(3)皮肤病成像,这些数据来自私人和大型公共数据集。对框架的预测性能提升和不公平缓解能力进行评估。
与基线相比,在所有三个数据集上,FairICP平均将预测准确率提高了7.2%,并将特权和非特权亚群体之间的准确率差距缩小了2.2%。
我们的工作为促进AI-CDSS中的信任和透明度提供了一个强大的解决方案,在医疗保健中为不同患者群体促进平等和公平。这种后处理方法对于为医疗保健环境中的AI-CDSS实施和监测建立一个强大的框架至关重要。