Department of Animal Science, Faculty of Agriculture and Natural Resources, Arak University, Arak, Iran.
Sci Rep. 2023 Jul 18;13(1):11592. doi: 10.1038/s41598-023-38601-z.
The assignment of an individual to the true population of origin using a low-panel of discriminant SNP markers is one of the most important applications of genomic data for practical use. The aim of this study was to evaluate the potential of different Artificial Neural Networks (ANNs) approaches consisting Deep Neural Networks (DNN), Garson and Olden methods for feature selection of informative SNP markers from high-throughput genotyping data, that would be able to trace the true breed of unknown samples. The total of 795 animals from 37 breeds, genotyped by using the Illumina SNP 50k Bead chip were used in the current study and principal component analysis (PCA), log-likelihood ratios (LLR) and Neighbor-Joining (NJ) were applied to assess the performance of different assignment methods. The results revealed that the DNN, Garson, and Olden methods are able to assign individuals to true populations with 4270, 4937, and 7999 SNP markers, respectively. The PCA was used to determine how the animals allocated to the groups using all genotyped markers available on 50k Bead chip and the subset of SNP markers identified with different methods. The results indicated that all SNP panels are able to assign individuals into their true breeds. The success percentage of genetic assignment for different methods assessed by different levels of LLR showed that the success rate of 70% in the analysis was obtained by three methods with the number of markers of 110, 208, and 178 tags for DNN, Garson, and Olden methods, respectively. Also the results showed that DNN performed better than other two approaches by achieving 93% accuracy at the most stringent threshold. Finally, the identified SNPs were successfully used in independent out-group breeds consisting 120 individuals from eight breeds and the results indicated that these markers are able to correctly allocate all unknown samples to true population of origin. Furthermore, the NJ tree of allele-sharing distances on the validation dataset showed that the DNN has a high potential for feature selection. In general, the results of this study indicated that the DNN technique represents an efficient strategy for selecting a reduced pool of highly discriminant markers for assigning individuals to the true population of origin.
使用低面板判别 SNP 标记将个体分配到真实的种群起源是基因组数据实际应用中最重要的应用之一。本研究的目的是评估不同人工神经网络(ANN)方法的潜力,这些方法包括深度神经网络(DNN)、Garson 和 Olden 方法,用于从高通量基因分型数据中选择信息性 SNP 标记,以便能够追踪未知样本的真实品种。本研究共使用了 37 个品种的 795 只动物,这些动物使用 Illumina SNP 50k Bead 芯片进行了基因分型,并应用主成分分析(PCA)、对数似然比(LLR)和邻接法(NJ)来评估不同分配方法的性能。结果表明,DNN、Garson 和 Olden 方法分别能够使用 4270、4937 和 7999 个 SNP 标记将个体分配到真实种群中。PCA 用于确定使用 50k Bead 芯片上可用的所有基因分型标记和不同方法确定的 SNP 标记子集,动物如何分配到组中。结果表明,所有 SNP 面板都能够将个体分配到其真实品种。通过不同水平的 LLR 评估不同方法的遗传分配成功率表明,在分析中,三种方法的成功率为 70%,使用的标记数分别为 110、208 和 178 个标记,用于 DNN、Garson 和 Olden 方法。此外,结果表明,DNN 通过在最严格的阈值下达到 93%的准确率,表现优于其他两种方法。最后,成功地将鉴定的 SNP 用于由来自八个品种的 120 只个体组成的独立外群品种,结果表明这些标记能够正确地将所有未知样本分配到真实的起源种群。此外,在验证数据集上的等位基因共享距离 NJ 树显示,DNN 具有很高的特征选择潜力。总的来说,本研究的结果表明,DNN 技术代表了一种有效的策略,用于选择一组减少的高度判别标记,以便将个体分配到真实的起源种群。