Currie Geoffrey, Hewis Johnathan, Hawk Elizabeth, Rohren Eric
Charles Sturt University, Wagga Wagga, New South Wales, Australia;
Baylor College of Medicine, Houston, Texas.
J Nucl Med Technol. 2024 Dec 4;52(4):356-359. doi: 10.2967/jnmt.124.268332.
Generative artificial intelligence (AI) text-to-image production could reinforce or amplify gender and ethnicity biases. Several text-to-image generative AI tools are used for producing images that represent the medical imaging professions. White male stereotyping and masculine cultures can dissuade women and ethnically divergent people from being drawn into a profession. In March 2024, DALL-E 3, Firefly 2, Stable Diffusion 2.1, and Midjourney 5.2 were utilized to generate a series of individual and group images of medical imaging professionals: radiologist, nuclear medicine physician, radiographer, and nuclear medicine technologist. Multiple iterations of images were generated using a variety of prompts. Collectively, 184 images were produced for evaluation of 391 characters. All images were independently analyzed by 3 reviewers for apparent gender and skin tone. Collectively (individual and group characters) ( = 391), 60.6% were male and 87.7% were of a light skin tone. DALL-E 3 (65.6%), Midjourney 5.2 (76.7%), and Stable Diffusion 2.1 (56.2%) had a statistically higher representation of men than Firefly 2 (42.9%) ( < 0.0001). With Firefly 2, 70.3% of characters had light skin tones, which was statistically lower ( < 0.0001) than for Stable Diffusion 2.1 (84.8%), Midjourney 5.2 (100%), and DALL-E 3 (94.8%). Overall, image quality metrics were average or better in 87.2% for DALL-E 3 and 86.2% for Midjourney 5.2, whereas 50.9% were inadequate or poor for Firefly 2 and 86.0% for Stable Diffusion 2.1. Generative AI text-to-image generation using DALL-E 3 via GPT-4 has the best overall quality compared with Firefly 2, Midjourney 5.2, and Stable Diffusion 2.1. Nonetheless, DALL-E 3 includes inherent biases associated with gender and ethnicity that demand more critical evaluation.
生成式人工智能(AI)文本到图像的生成可能会强化或放大性别和种族偏见。一些文本到图像的生成式AI工具被用于生成代表医学成像专业的图像。对白人男性的刻板印象和男性化文化可能会阻碍女性和不同种族的人投身于该职业。2024年3月,利用DALL-E 3、Firefly 2、Stable Diffusion 2.1和Midjourney 5.2生成了一系列医学成像专业人员的个人和群体图像:放射科医生、核医学医师、放射技师和核医学技术人员。使用各种提示词生成了多轮图像。总共生成了184张图像用于评估391个角色。所有图像由3名评审员独立分析其明显的性别和肤色。总体而言(个人和群体角色)(=391),60.6%为男性,87.7%为浅肤色。DALL-E 3(65.6%)、Midjourney 5.2(76.7%)和Stable Diffusion 2.1(56.2%)中男性的占比在统计学上高于Firefly 2(42.9%)(<0.0001)。在Firefly 2中,70.3%的角色为浅肤色,这在统计学上低于Stable Diffusion 2.1(84.8%)、Midjourney 5.2(100%)和DALL-E 3(94.8%)(<0.0001)。总体而言,DALL-E 3的图像质量指标平均水平及以上的占87.2%,Midjourney 5.2的占86.2%,而Firefly 2的不足或较差的占50.9%,Stable Diffusion 2.1的占86.0%。与Firefly 2、Midjourney 5.2和Stable Diffusion 2.1相比,通过GPT-4使用DALL-E 3进行的生成式AI文本到图像生成的总体质量最佳。尽管如此,DALL-E 3存在与性别和种族相关的固有偏见,需要更严格的评估。