van der Wal Douwe, Jhun Iny, Laklouk Israa, Nirschl Jeff, Richer Lara, Rojansky Rebecca, Theparee Talent, Wheeler Joshua, Sander Jörg, Feng Felix, Mohamad Osama, Savarese Silvio, Socher Richard, Esteva Andre
Salesforce AI Research, 575 High St, Palo Alto, CA, 94301, USA.
Stanford University, 450 Serra Mall, Stanford, CA, 94305, USA.
NPJ Digit Med. 2021 Oct 7;4(1):145. doi: 10.1038/s41746-021-00520-6.
Biology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case-annotating cell types-and running experiments with seven pathologists-experts at the microscopic analysis of biological specimens-we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.
生物学已成为深度学习和人工智能(AI)应用的主要领域,这在很大程度上得益于该领域能够生成的海量数据集。大多数AI任务的关键在于要有一个足够大的、带有标签的数据集来训练AI模型。在显微镜检查的背景下,生成包含数百万个细胞和结构的图像数据集很容易。然而,为AI模型获取大规模高质量注释具有挑战性。在此,我们展示了HALS(人类增强标注系统),一种人在回路的数据标注AI,它最初未初始化,并实时从人类那里学习注释。通过使用由三个深度学习模型组成的多部分AI,HALS仅从几个示例中学习,并立即减少注释者的工作量,同时提高其注释的质量。通过使用一个高度重复的用例——标注细胞类型——并与七位病理学家(生物标本显微镜分析专家)进行实验,我们证明在四个用例和两种组织染色类型中,人工工作量减少了90.60%,平均数据质量提高了4.34%。