Akdemir Deniz, Rio Simon, Isidro Y Sánchez Julio
Agriculture & Food Science Centre, Animal and Crop Science Division, University College Dublin, Dublin, Ireland.
Centro de Biotecnologia y Genómica de Plantas (CBGP, UPM-INIA), Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain.
Front Genet. 2021 May 7;12:655287. doi: 10.3389/fgene.2021.655287. eCollection 2021.
A major barrier to the wider use of supervised learning in emerging applications, such as genomic selection, is the lack of sufficient and representative labeled data to train prediction models. The amount and quality of labeled training data in many applications is usually limited and therefore careful selection of the training examples to be labeled can be useful for improving the accuracies in predictive learning tasks. In this paper, we present an R package, TrainSel, which provides flexible, efficient, and easy-to-use tools that can be used for the selection of training populations (STP). We illustrate its use, performance, and potentials in four different supervised learning applications within and outside of the plant breeding area.
在诸如基因组选择等新兴应用中,监督学习更广泛应用的一个主要障碍是缺乏足够且具有代表性的标记数据来训练预测模型。在许多应用中,标记训练数据的数量和质量通常是有限的,因此仔细选择要标记的训练示例对于提高预测学习任务的准确性可能是有用的。在本文中,我们展示了一个R包TrainSel,它提供了灵活、高效且易于使用的工具,可用于训练群体选择(STP)。我们在植物育种领域内外的四种不同监督学习应用中说明了它的用途、性能和潜力。