Sankarnarayanan Tharangini, Paciorkowski Lev, Parikh Khevna, Hamilton-Fletcher Giles, Feng Chen, Sheng Diwei, Hudson Todd E, Rizzo John-Ross, Chan Kevin C
Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10340454.
Recent object detection models show promising advances in their architecture and performance, expanding potential applications for the benefit of persons with blindness or low vision (pBLV). However, object detection models are usually trained on generic data rather than datasets that focus on the needs of pBLV. Hence, for applications that locate objects of interest to pBLV, object detection models need to be trained specifically for this purpose. Informed by prior interviews, questionnaires, and Microsoft's ORBIT research, we identified thirty-five objects pertinent to pBLV. We employed this user-centric feedback to gather images of these objects from the Google Open Images V6 dataset. We subsequently trained a YOLOv5x model with this dataset to recognize these objects of interest. We demonstrate that the model can identify objects that previous generic models could not, such as those related to tasks of daily functioning - e.g., coffee mug, knife, fork, and glass. Crucially, we show that careful pruning of a dataset with severe class imbalances leads to a rapid, noticeable improvement in the overall performance of the model by two-fold, as measured using the mean average precision at the intersection over union thresholds from 0.5 to 0.95 (mAP50-95). Specifically, mAP50-95 improved from 0.14 to 0.36 on the seven least prevalent classes in the training dataset. Overall, we show that careful curation of training data can improve training speed and object detection outcomes. We show clear directions on effectively customizing training data to create models that focus on the desires and needs of pBLV.Clinical Relevance- This work demonstrated the benefits of developing assistive AI technology customized to individual users or the wider BLV community.
最近的目标检测模型在其架构和性能方面取得了令人瞩目的进展,为盲人或视力低下者(pBLV)带来了更多潜在应用。然而,目标检测模型通常是在通用数据上进行训练,而非针对pBLV需求的数据集。因此,对于定位pBLV感兴趣对象的应用,目标检测模型需要专门为此目的进行训练。基于之前的访谈、问卷调查以及微软的ORBIT研究,我们确定了35个与pBLV相关的对象。我们利用这种以用户为中心的反馈,从谷歌开放图像V6数据集中收集这些对象的图像。随后,我们使用该数据集训练了一个YOLOv5x模型,以识别这些感兴趣的对象。我们证明,该模型能够识别先前通用模型无法识别的对象,例如与日常功能任务相关的对象,如咖啡杯、刀、叉和玻璃杯。至关重要的是,我们表明,对存在严重类别不平衡的数据集进行仔细修剪,可使模型的整体性能迅速显著提高两倍,这是通过在交并比阈值从0.5到0.95的平均精度均值(mAP50 - 95)来衡量的。具体而言,在训练数据集中七个最不常见的类别上,mAP50 - 95从0.14提高到了0.36。总体而言,我们表明仔细整理训练数据可以提高训练速度和目标检测结果。我们为有效定制训练数据以创建关注pBLV的愿望和需求的模型指明了明确方向。临床相关性——这项工作展示了开发针对个体用户或更广泛的BLV群体定制的辅助人工智能技术的益处。