机器学习算法能够准确识别自由生活的海洋线虫物种。

Machine learning algorithms accurately identify free-living marine nematode species.

机构信息

Marine Science Institute, Federal University of São Paulo, Santos, São Paulo, Brazil.

Institute Oceanographic, University of São Paulo, São Paulo, Brazil.

出版信息

PeerJ. 2023 Oct 9;11:e16216. doi: 10.7717/peerj.16216. eCollection 2023.

DOI:10.7717/peerj.16216

PMID:37842061

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10569207/

Abstract

BACKGROUND

Identifying species, particularly small metazoans, remains a daunting challenge and the phylum Nematoda is no exception. Typically, nematode species are differentiated based on morphometry and the presence or absence of certain characters. However, recent advances in artificial intelligence, particularly machine learning (ML) algorithms, offer promising solutions for automating species identification, mostly in taxonomically complex groups. By training ML models with extensive datasets of accurately identified specimens, the models can learn to recognize patterns in nematodes' morphological and morphometric features. This enables them to make precise identifications of newly encountered individuals. Implementing ML algorithms can improve the speed and accuracy of species identification and allow researchers to efficiently process vast amounts of data. Furthermore, it empowers non-taxonomists to make reliable identifications. The objective of this study is to evaluate the performance of ML algorithms in identifying species of free-living marine nematodes, focusing on two well-known genera: Allgén, 1933 and Rouville, 1903.

METHODS

A total of 40 species of and 60 species of were considered. The measurements and identifications were obtained from the original publications of species for both genera, this compilation included information regarding the presence or absence of specific characters, as well as morphometric data. To assess the performance of the species identification four ML algorithms were employed: Random Forest (RF), Stochastic Gradient Boosting (SGBoost), Support Vector Machine (SVM) with both linear and radial kernels, and K-nearest neighbor (KNN) algorithms.

RESULTS

For both genera, the random forest (RF) algorithm demonstrated the highest accuracy in correctly classifying specimens into their respective species, achieving an accuracy rate of 93% for and 100% for , only a single individual from of the test data was misclassified.

CONCLUSION

These results highlight the overall effectiveness of ML algorithms in species identification. Moreover, it demonstrates that the identification of marine nematodes can be automated, optimizing biodiversity and ecological studies, as well as turning species identification more accessible, efficient, and scalable. Ultimately it will contribute to our understanding and conservation of biodiversity.

摘要

背景

识别物种，特别是小型后生动物，仍然是一项艰巨的挑战，环节动物门也不例外。通常，线虫物种的区分基于形态计量学和某些特征的存在或缺失。然而，人工智能的最新进展，特别是机器学习 (ML) 算法，为自动化物种鉴定提供了有希望的解决方案，主要是在分类上复杂的群体中。通过使用经过准确鉴定标本的大量数据集来训练 ML 模型，模型可以学习识别线虫形态和形态计量特征中的模式。这使它们能够对新遇到的个体进行精确鉴定。实施 ML 算法可以提高物种鉴定的速度和准确性，并使研究人员能够有效地处理大量数据。此外，它使非分类学家能够进行可靠的鉴定。本研究的目的是评估 ML 算法在鉴定自由生活海洋线虫物种中的性能，重点关注两个著名的属：Allgén, 1933 和 Rouville, 1903。

方法

共考虑了 40 种属和 60 种属。这些测量和鉴定是从这两个属的物种原始出版物中获得的，这些汇编包括关于特定特征存在或缺失的信息，以及形态计量数据。为了评估物种鉴定的性能，使用了四种 ML 算法：随机森林 (RF)、随机梯度提升 (SGBoost)、带有线性和径向核的支持向量机 (SVM) 和 K-最近邻 (KNN) 算法。