Oseku Elizabeth, Mariaria Petra Kerubo, Semakula Henry, Kahuma Clare Allelua, Balaba Martin, Naggirinya Agnes Bwanika, King Rachel Lisa, Parkes-Ratanshi Rosalind
Academy for Health Innovation Uganda, Infectious Diseases Institute, Makerere University, Kampala, Uganda.
Department of Epidemiology and Biostatistics, Institute for Global Health Sciences, University of California, San Fransisco, CA, United States.
JMIR Res Protoc. 2025 Sep 9;14:e70005. doi: 10.2196/70005.
Sexually transmitted infections are a significant public health concern, particularly in sub-Saharan Africa, where their prevalence remains high. Promoting awareness and reducing stigma are essential strategies for addressing this challenge, but those affected often have limited access to accurate and culturally appropriate health information. Therefore, innovative solutions are essential to enhance sexual health literacy and encourage informed health-seeking behaviors. Artificial intelligence (AI)-enabled tools, such as chatbots, have emerged as promising avenues for delivering accurate and accessible health information. However, their potential is constrained by the lack of contextualized datasets, which are crucial for ensuring their effectiveness and relevance to diverse populations.
This study aims to develop an open access, contextualized dataset of question-and-answer pairs on sexual health and sexually transmitted infections to support the development and training of digital and AI-enabled health information tools.
Using a crowdsourcing approach, questions are being collected from participants aged ≥15 years via online platforms, paper-based submissions, and in-person interactions at public events across sub-Saharan Africa. Each question will be anonymized and reviewed by medical professionals who will provide accurate, evidence-based answers. The dataset will then undergo processing, including cleaning and tagging for AI training, ensuring adherence to findability, accessibility, interoperability, and reusability principles. The final dataset will be published as open access.
Data collection began on June 12, 2024, and is ongoing. The data collection process was piloted in Kigali, Rwanda, where 132 questions were collected. As of August 2025, the study had collected over 5620 question-and-answer pairs. The collected data are undergoing a simultaneous rigorous data processing phase in collaboration with health workers who provide evidence-based answers to the questions and new questions based on their experience in the clinic. The data cleaning and processing will enhance the utility of the data for AI applications.
The final dataset will be published as open access in 2025, contributing to the development of AI-driven health tools and promoting public health literacy.
INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/70005.
性传播感染是一个重大的公共卫生问题,在撒哈拉以南非洲地区尤为突出,该地区的感染率仍然很高。提高认识和减少污名化是应对这一挑战的重要策略,但感染者往往难以获得准确且符合文化背景的健康信息。因此,创新解决方案对于提高性健康素养和鼓励明智的就医行为至关重要。人工智能(AI)支持的工具,如聊天机器人,已成为提供准确且易于获取的健康信息的有前景的途径。然而,它们的潜力受到缺乏情境化数据集的限制,而情境化数据集对于确保其有效性以及与不同人群的相关性至关重要。
本研究旨在开发一个关于性健康和性传播感染的问答对开放获取情境化数据集,以支持数字和人工智能健康信息工具的开发与培训。
采用众包方法,通过在线平台、纸质提交以及在撒哈拉以南非洲各地公共活动中的面对面互动,从年龄≥15岁的参与者中收集问题。每个问题将进行匿名处理,并由医学专业人员进行审核,他们将提供准确的、基于证据的答案。然后,该数据集将进行处理,包括清理和标记以便进行人工智能训练,确保遵循可查找性、可访问性、互操作性和可重用性原则。最终数据集将作为开放获取资源发布。
数据收集于2024年6月12日开始,目前仍在进行中。数据收集过程在卢旺达基加利进行了试点,共收集了132个问题。截至2025年8月,该研究已收集了超过5620个问答对。所收集的数据正在与卫生工作者合作进行同步严格的数据处理阶段,卫生工作者根据他们在诊所的经验为问题提供基于证据的答案以及新问题。数据清理和处理将提高数据在人工智能应用中的效用。
最终数据集将于2025年作为开放获取资源发布,为人工智能驱动的健康工具的开发做出贡献,并提高公众健康素养。
国际注册报告识别号(IRRID):DERR1-10.2196/70005