Industrial and Applied Genomics, AI and Cognitive Software, IBM Research - Almaden, San Jose, CA, USA.
NSF Center for Cellular Construction, University of California San Francisco, San Francisco, CA, USA.
Sci Rep. 2020 Jul 22;10(1):12142. doi: 10.1038/s41598-020-68662-3.
The acquisition of increasingly large plankton digital image datasets requires automatic methods of recognition and classification. As data size and collection speed increases, manual annotation and database representation are often bottlenecks for utilization of machine learning algorithms for taxonomic classification of plankton species in field studies. In this paper we present a novel set of algorithms to perform accurate detection and classification of plankton species with minimal supervision. Our algorithms approach the performance of existing supervised machine learning algorithms when tested on a plankton dataset generated from a custom-built lensless digital device. Similar results are obtained on a larger image dataset obtained from the Woods Hole Oceanographic Institution. Additionally, we introduce a new algorithm to perform anomaly detection on unclassified samples. Here an anomaly is defined as a significant deviation from the established classification. Our algorithms are designed to provide a new way to monitor the environment with a class of rapid online intelligent detectors.
越来越大的浮游生物数字图像数据集的获取需要自动的识别和分类方法。随着数据规模和采集速度的增加,手动注释和数据库表示通常是在野外研究中利用机器学习算法对浮游生物物种进行分类的瓶颈。在本文中,我们提出了一组新的算法,以在最小监督的情况下进行浮游生物物种的精确检测和分类。当在从定制无透镜数字设备生成的浮游生物数据集上测试时,我们的算法可以达到现有监督机器学习算法的性能。在从伍兹霍尔海洋学研究所获得的更大的图像数据集上也得到了类似的结果。此外,我们引入了一种新的算法来对未分类的样本进行异常检测。这里的异常定义为与已建立的分类的显著偏差。我们的算法旨在提供一种使用一类快速在线智能探测器来监测环境的新方法。