Genomics Division, National Institute of Agricultural Sciences, Jeonju, 54874, Korea.
Research and Development Center, Insilicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of Korea.
Sci Rep. 2021 Apr 13;11(1):8019. doi: 10.1038/s41598-021-87281-0.
Bellflower is an edible ornamental gardening plant in Asia. For predicting the flower color in bellflower plants, a transcriptome-wide approach based on machine learning, transcriptome, and genotyping chip analyses was used to identify SNP markers. Six machine learning methods were deployed to explore the classification potential of the selected SNPs as features in two datasets, namely training (60 RNA-Seq samples) and validation (480 Fluidigm chip samples). SNP selection was performed in sequential order. Firstly, 96 SNPs were selected from the transcriptome-wide SNPs using the principal compound analysis (PCA). Then, 9 among 96 SNPs were later identified using the Random forest based feature selection method from the Fluidigm chip dataset. Among six machines, the random forest (RF) model produced higher classification performance than the other models. The 9 SNP marker candidates selected for classifying the flower color classification were verified using the genomic DNA PCR with Sanger sequencing. Our results suggest that this methodology could be used for future selection of breeding traits even though the plant accessions are highly heterogeneous.
风铃草是亚洲一种可食用的观赏园艺植物。为了预测风铃草植物的花色,采用基于机器学习、转录组和基因分型芯片分析的全转录组方法来鉴定 SNP 标记。六种机器学习方法被部署用于探索所选 SNP 作为特征在两个数据集(60 个 RNA-Seq 样本和 480 个 Fluidigm 芯片样本)中的分类潜力。SNP 选择按顺序进行。首先,使用主成分分析(PCA)从全转录组 SNP 中选择 96 个 SNP。然后,使用 Fluidigm 芯片数据集的基于随机森林的特征选择方法从 96 个 SNP 中选择 9 个 SNP。在六种机器中,随机森林(RF)模型的分类性能高于其他模型。使用基因组 DNA PCR 和 Sanger 测序对用于分类花色的 9 个 SNP 标记候选物进行了验证。我们的结果表明,即使植物品系高度异质,这种方法也可用于未来的育种性状选择。