Rullo Ryan, Maatouk Ali, Huang Tinglin, Chen Jialin, Qiu Weikang, O'Connor Giselle, Womack Julie, Sadak Tatiana, Rodriguez Christine, de Jesus Espinosa Tania, Carneiro Pedro, Marshall Ami, Ying Rex, Ramos S Raquel
School of Nursing, Yale University, 400 West Campus Drive, Orange, US.
Department of Computer Science, School of Engineering & Applied Science, Yale University, New Haven, US.
J Med Internet Res. 2025 Aug 11. doi: 10.2196/77053.
The integration of Artificial Intelligence in healthcare presents as a significant opportunity to revolutionize patient care. In the United States, an estimated 129 million people have at least one chronic illness, with 42% having two or more. Despite being largely preventable, the prevalence of chronic illness is expected to rise and impose significant economic burdens and financial toxicity on healthcare consumers. We leveraged an interdisciplinary team encompassing nursing, public health, and computer science to optimize health through prevention education for cardiovascular and metabolic comorbidities in persons living with HIV. In this tutorial, we describe the iterative, data-based development and evaluation of an intersectionality-informed large language model designed to support patient teaching in this population. First, we curated data by scraping publicly available, authoritative, evidence-based sources to capture a comprehensive dataset, supplemented by publicly available HIV forum content. Second, we benchmarked candidate large language models and generated a fine-tuning dataset using GPT-4 through multi-turn question-answer conversations, employing standardized metrics to assess baseline model performance. Third, we iteratively refined the selected model via Low-Rank Adaptation and reinforcement learning, integrating quantitative metrics with qualitative expert evaluations. Pre-existing LLM models demonstrated poor n-gram agreement, dissonance from model answers (Accuracy 4.16, Readability 4.63, Professionalism 4.58), and difficult readability (Kincaid 8.54, Jargon 4.44). After prompt adjustments and fine-tuning, preliminary results demonstrate the potential of a customized LLaMA-based LLM to provide personalized, culturally salient patient education. We present a data-based, step-by-step tutorial for interdisciplinary development of CARDIO, a specialized LLM, for cardiovascular health education in HIV care. Through comprehensive data curation and scraping, systematic benchmarking, and a dual-stage fine-tuning pipeline, CARDIO's performance improved markedly (Accuracy 5.0, Readability 4.98, Professionalism 4.98, Kincaid 7.17, Jargon 2.92). Although patient pilot testing remains forthcoming, our results demonstrate that targeted data curation, rigorous benchmarking, and iterative fine-tuning have provided a robust evaluation of the model's potential. By building an LLM tailored to cardiovascular health promotion and patient education, this work lays the foundation for innovative AI-driven strategies to manage comorbid conditions in people living with HIV.
人工智能在医疗保健领域的整合为彻底改变患者护理提供了重大机遇。在美国,估计有1.29亿人至少患有一种慢性病,其中42%的人患有两种或更多种慢性病。尽管慢性病在很大程度上是可以预防的,但预计其患病率仍将上升,并给医疗保健消费者带来巨大的经济负担和财务毒性。我们利用了一个跨学科团队,包括护理、公共卫生和计算机科学,通过对艾滋病毒感染者的心血管和代谢合并症进行预防教育来优化健康状况。在本教程中,我们描述了一个基于交叉性的大语言模型的迭代式、基于数据的开发和评估,该模型旨在支持对这一人群的患者教育。首先,我们通过抓取公开可用的、权威的、基于证据的来源来策划数据,以获取一个全面的数据集,并辅以公开可用的艾滋病毒论坛内容。其次,我们对候选大语言模型进行基准测试,并通过多轮问答对话使用GPT-4生成一个微调数据集,采用标准化指标来评估基线模型性能。第三,我们通过低秩适应和强化学习迭代地改进所选模型,将定量指标与定性专家评估相结合。现有的大语言模型表现出较差的n元语法一致性、与模型答案不一致(准确率4.16、可读性4.63、专业性4.58)以及可读性差(金凯德可读性指数8.54、行话4.44)。经过提示调整和微调后,初步结果表明基于定制的基于LLaMA的大语言模型有潜力提供个性化的、具有文化特色的患者教育。我们为专门用于艾滋病毒护理中心血管健康教育的CARDIO大语言模型的跨学科开发提供了一个基于数据的分步教程。通过全面的数据策划和抓取、系统的基准测试以及双阶段微调管道,CARDIO模型的性能有了显著提高(准确率5.0、可读性4.98、专业性4.98、金凯德可读性指数7.17、行话2.92)。尽管患者试点测试仍有待进行,但我们的结果表明,有针对性的数据策划、严格的基准测试和迭代微调为评估模型的潜力提供了有力支持。通过构建一个针对心血管健康促进和患者教育的大语言模型,这项工作为创新的人工智能驱动策略奠定了基础,以管理艾滋病毒感染者的合并症。