Kaplan Shimon, Handelman Doron, Handelman Amir
Department of Electrical Engineering, Faculty of Engineering, Holon Institute of Technology, Holon, Israel.
Givatayim, Israel.
AI Ethics. 2021;1(4):425-434. doi: 10.1007/s43681-021-00049-0. Epub 2021 Mar 23.
Artificial intelligence (AI) systems are extensively used today in many fields. In the field of medicine, AI-systems are especially used for the segmentation and classification of medical images. As reliance on such AI-systems increases, it is important to verify that these systems are dependable and not sensitive to bias or other types of errors that may severely affect users and patients. This work investigates the sensitivity of the performance of AI-systems to labeling errors. Such investigation is performed by simulating intentional mislabeling of training images according to different values of a new parameter called "mislabeling balance" and a "corruption" parameter, and then measuring the accuracy of the AI-systems for every value of these parameters. The issues investigated in this work include the amount (percentage) of errors from which a substantial adverse effect on the performance of the AI-systems can be observed, and how unreliable labeling can be done in the training stage. The goals of this work are to raise ethical concerns regarding the various types of errors that can possibly find their way into AI-systems, to demonstrate the effect of training errors, and to encourage development of techniques that can cope with the problem of errors, especially for AI-systems that perform sensitive medical-related tasks.
如今,人工智能(AI)系统在许多领域都得到了广泛应用。在医学领域,AI系统尤其用于医学图像的分割和分类。随着对这类AI系统的依赖程度不断增加,验证这些系统是否可靠,以及是否对可能严重影响用户和患者的偏差或其他类型的错误不敏感,变得至关重要。这项工作研究了AI系统性能对标注错误的敏感性。这种研究是通过根据一个名为“错误标注平衡”的新参数和一个“损坏”参数的不同值,模拟训练图像的故意错误标注来进行的,然后针对这些参数的每个值测量AI系统的准确性。这项工作所研究的问题包括:能观察到对AI系统性能产生重大不利影响的错误数量(百分比),以及在训练阶段如何进行不可靠的标注。这项工作的目标是引发人们对可能混入AI系统的各类错误的伦理关注,展示训练错误的影响,并鼓励开发能够应对错误问题的技术,特别是对于执行敏感医疗相关任务的AI系统。