Bartlett Peter, Eberhardt Ursula, Schütz Nicole, Beker Henry J
La Baraka, Gorse Hill Road, Virginia Water, Surrey, GU25 4AP, UK.
Staatliches Museum für Naturkunde Stuttgart, Rosenstein 1, 70191, Stuttgart, Germany.
IMA Fungus. 2022 Jun 30;13(1):13. doi: 10.1186/s43008-022-00099-x.
The genus Hebeloma is renowned as difficult when it comes to species determination. Historically, many dichotomous keys have been published and used with varying success rate. Over the last 20 years the authors have built a database of Hebeloma collections containing not only metadata but also parametrized morphological descriptions, where for about a third of the cases micromorphological characters have been analysed and are included, as well as DNA sequences for almost every collection. The database now has about 9000 collections including nearly every type collection worldwide and represents over 120 different taxa. Almost every collection has been analysed and identified to species using a combination of the available molecular and morphological data in addition to locality and habitat information. Based on these data an Artificial Intelligence (AI) machine-learning species identifier has been developed that takes as input locality data and a small number of the morphological parameters. Using a random test set of more than 600 collections from the database, not utilized within the set of collections used to train the identifier, the species identifier was able to identify 77% correctly with its highest probabilistic match, 96% within its three most likely determinations and over 99% of collections within its five most likely determinations.
在物种鉴定方面,Hebeloma属以其鉴定难度而闻名。从历史上看,已经发表了许多二叉式检索表并被使用,但其成功率各不相同。在过去的20年里,作者建立了一个Hebeloma属标本数据库,其中不仅包含元数据,还包含参数化的形态学描述,约三分之一的标本案例分析了微观形态特征并将其纳入其中,并且几乎每个标本都有DNA序列。该数据库现在有大约9000个标本,包括全球几乎所有的模式标本,代表了120多个不同的分类单元。除了产地和栖息地信息外,几乎每个标本都结合现有的分子和形态学数据进行了分析并鉴定到物种。基于这些数据,开发了一种人工智能(AI)机器学习物种识别器,它将产地数据和少量形态学参数作为输入。使用数据库中600多个标本组成的随机测试集(这些标本未用于训练识别器的标本集中),该物种识别器能够以其最高概率匹配正确识别77%,在其三个最可能的鉴定结果内正确识别96%,在其五个最可能的鉴定结果内正确识别超过99%的标本。