深度学习达到多少才能保证自动识别的可靠性?

How much deep learning is enough for automatic identification to be reliable?

出版信息

Angle Orthod. 2020 Nov 1;90(6):823-830. doi: 10.2319/021920-116.1.

Abstract

OBJECTIVES

To determine the optimal quantity of learning data needed to develop artificial intelligence (AI) that can automatically identify cephalometric landmarks.

MATERIALS AND METHODS

A total of 2400 cephalograms were collected, and 80 landmarks were manually identified by a human examiner. Of these, 2200 images were chosen as the learning data to train AI. The remaining 200 images were used as the test data. A total of 24 combinations of the quantity of learning data (50, 100, 200, 400, 800, 1600, and 2000) were selected by the random sampling method without replacement, and the number of detecting targets per image (19, 40, and 80) were used in the AI training procedures. The training procedures were repeated four times. A total of 96 different AIs were produced. The accuracy of each AI was evaluated in terms of radial error.

RESULTS

The accuracy of AI increased linearly with the increasing number of learning data sets on a logarithmic scale. It decreased with increasing numbers of detection targets. To estimate the optimal quantity of learning data, a prediction model was built. At least 2300 sets of learning data appeared to be necessary to develop AI as accurate as human examiners.

CONCLUSIONS

A considerably large quantity of learning data was necessary to develop accurate AI. The present study might provide a basis to determine how much learning data would be necessary in developing AI.

摘要

目的

确定开发能够自动识别头影测量标志的人工智能(AI)所需的最佳学习数据量。

材料和方法

共收集了 2400 张头颅侧位片,并由人类检查者手动识别了 80 个标志。其中,2200 张图像被选为 AI 训练的学习数据。其余 200 张图像被用作测试数据。通过无放回的随机抽样方法,选择了 24 种不同的学习数据量(50、100、200、400、800、1600 和 2000)组合,以及每个图像的检测目标数量(19、40 和 80),用于 AI 训练过程。训练过程重复了四次。总共生成了 96 个不同的 AI。通过径向误差评估每个 AI 的准确性。

结果

在对数尺度上,AI 的准确性随着学习数据集数量的增加呈线性增加。它随着检测目标数量的增加而降低。为了估计最佳的学习数据量,建立了一个预测模型。至少需要 2300 套学习数据才能开发出与人类检查者一样准确的 AI。

结论

开发准确的 AI 需要相当大量的学习数据。本研究可能为确定开发 AI 需要多少学习数据提供了依据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索