Suppr超能文献

可靠的基因组策略用于植物遗传资源的物种分类。

Reliable genomic strategies for species classification of plant genetic resources.

机构信息

Centre for Genetic Resources, Wageningen University and Research, P.O. Box 16, 6700 AA, Wageningen, The Netherlands.

出版信息

BMC Bioinformatics. 2021 Mar 31;22(1):173. doi: 10.1186/s12859-021-04018-6.

Abstract

BACKGROUND

To address the need for easy and reliable species classification in plant genetic resources collections, we assessed the potential of five classifiers (Random Forest, Neighbour-Joining, 1-Nearest Neighbour, a conservative variety of 3-Nearest Neighbours and Naive Bayes) We investigated the effects of the number of accessions per species and misclassification rate on classification success, and validated theirs generic value results with three complete datasets.

RESULTS

We found the conservative variety of 3-Nearest Neighbours to be the most reliable classifier when varying species representation and misclassification rate. Through the analysis of the three complete datasets, this finding showed generic value. Additionally, we present various options for marker selection for classification taks such as these.

CONCLUSIONS

Large-scale genomic data are increasingly being produced for genetic resources collections. These data are useful to address species classification issues regarding crop wild relatives, and improve genebank documentation. Implementation of a classification method that can improve the quality of bad datasets without gold standard training data is considered an innovative and efficient method to improve gene bank documentation.

摘要

背景

为满足植物遗传资源收集品中简便可靠的物种分类需求,我们评估了五种分类器(随机森林、邻接法、最近邻法、保守的 3 近邻法和朴素贝叶斯)的潜力。我们研究了每个物种的样本数量和错误分类率对分类成功率的影响,并使用三个完整数据集验证了它们的泛化价值结果。

结果

当物种表现和错误分类率变化时,我们发现保守的 3 近邻法是最可靠的分类器。通过对三个完整数据集的分析,这一发现显示出了泛化价值。此外,我们还为这种分类任务提供了各种标记选择选项。

结论

遗传资源收集品中产生了越来越多的大规模基因组数据。这些数据可用于解决作物野生近缘种的物种分类问题,并改进基因库的文献记载。实施一种分类方法,在没有黄金标准训练数据的情况下提高不良数据集的质量,被认为是一种创新和有效的方法,可以改进基因库的文献记载。

相似文献

本文引用的文献

2
cyvcf2: fast, flexible variant analysis with Python.cyvcf2:使用Python进行快速、灵活的变异分析。
Bioinformatics. 2017 Jun 15;33(12):1867-1869. doi: 10.1093/bioinformatics/btx057.
8
DNA barcodes for ecology, evolution, and conservation.用于生态学、进化和保护的 DNA 条码。
Trends Ecol Evol. 2015 Jan;30(1):25-35. doi: 10.1016/j.tree.2014.10.008. Epub 2014 Nov 19.
9
Reference-free SNP detection: dealing with the data deluge.无参考单核苷酸多态性检测:应对数据洪流
BMC Genomics. 2014;15 Suppl 4(Suppl 4):S10. doi: 10.1186/1471-2164-15-S4-S10. Epub 2014 May 20.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验