基于深度学习的兴奋剂药物文本识别系统的验证研究，以确保运动员安全用药。

A Validation Study of a Deep Learning-Based Doping Drug Text Recognition System to Ensure Safe Drug Use among Athletes.

作者信息

Lee Sang-Yong, Park Jae-Hyeon, Yoon Jiwun, Lee Ji-Yong

机构信息

Center for Sports and Performance Analysis, Korea National Sport University, Seoul 05541, Republic of Korea.

出版信息

Healthcare (Basel). 2023 Jun 15;11(12):1769. doi: 10.3390/healthcare11121769.

DOI:10.3390/healthcare11121769

PMID:37372885

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10297893/

Abstract

This study aimed to develop an English version of a doping drug-recognition system using deep learning-based optical character recognition (OCR) technology. A database of 336 banned substances was built based on the World Anti-Doping Agency's International Standard Prohibited List and the Korean Pharmaceutical Information Center's Drug Substance Information. For accuracy and validity analysis, 886 drug substance images, including 152 images of prescriptions and drug substance labels collected using data augmentation, were used. The developed hybrid system, based on the Tesseract OCR model, can be accessed by both a smartphone and website. A total of 5379 words were extracted, and the system showed character recognition errors regarding 91 words, showing high accuracy (98.3%). The system correctly classified all 624 images for acceptable substances, 218 images for banned substances, and incorrectly recognized 44 of the banned substances as acceptable. The validity analysis showed a high level of accuracy (0.95), sensitivity (1.00), and specificity (0.93), suggesting system validity. The system has the potential of allowing athletes who lack knowledge about doping to quickly and accurately check whether they are taking banned substances. It may also serve as an efficient option to support the development of a fair and healthy sports culture.

摘要

本研究旨在利用基于深度学习的光学字符识别（OCR）技术开发一种英文版的兴奋剂药物识别系统。基于世界反兴奋剂机构的《国际标准禁用清单》和韩国药品信息中心的药品物质信息，建立了一个包含336种禁用物质的数据库。为了进行准确性和有效性分析，使用了886张药品物质图像，其中包括通过数据增强收集的152张处方和药品物质标签图像。基于Tesseract OCR模型开发的混合系统可通过智能手机和网站访问。共提取了5379个单词，该系统有91个单词出现字符识别错误，准确率较高（98.3%）。该系统正确分类了所有624张可接受物质的图像、218张禁用物质的图像，并且将44种禁用物质错误识别为可接受物质。有效性分析显示出较高的准确率（0.95）、灵敏度（1.00）和特异性（0.93），表明系统具有有效性。该系统有可能让不了解兴奋剂的运动员快速准确地检查自己是否服用了禁用物质。它还可以作为一种有效的选择，以支持公平和健康的体育文化的发展。