Seo Dongwon, Cho Sunghyun, Manjula Prabuddha, Choi Nuri, Kim Young-Kuk, Koh Yeong Jun, Lee Seung Hwan, Kim Hyung-Yong, Lee Jun Heon
Division of Animal and Dairy Science, Chungnam National University, Daejeon 34134, Korea.
Bio-AI Convergence Research Center, Chungnam National University, Daejeon 34134, Korea.
Animals (Basel). 2021 Jan 19;11(1):241. doi: 10.3390/ani11010241.
A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.
一种能够对特定鸡群进行分类的标记组合,可通过增强消费者对鸡群来源的信心来提高商业价值。这将有助于在各国市场中保护本地遗传资源。在本研究中,使用600k高密度单核苷酸多态性(SNP)芯片,对来自20个品系的总共283个样本进行了分析,这些样本包括韩国本土鸡、商业本土鸡以及带有蛋鸡群体的商业肉鸡,以确定包含最少标记数量的最佳标记组合。机器学习算法、全基因组关联研究(GWAS)、连锁不平衡(LD)分析和主成分分析(PCA)被用于区分目标(病例)组,以便与对照鸡群进行比较。在标记选择过程中,总共47303个SNP用于鸡群分类;96个经LD修剪的SNP(每个LD块50个SNP)作为目标鸡分类的最佳标记组合。此外,通过AdaBoost(AB)、随机森林(RF)和决策树(DT)机器学习分类模型分别选择了36、44和8个SNP作为最少标记数量,其准确率分别为99.6%、98.0%和97.9%。所选的标记组合增加了病例组和对照组之间的遗传距离和固定指数(Fst)值,并减少了所需的遗传成分数量,证实了使用少量标记集能够有效地对群体进行分类。在一项包括额外鸡品种和样本(12个品系和182个样本)的验证研究中,准确率没有显著变化,并且目标鸡群能够与其他群体清楚地区分开来。本研究中使用的GWAS、PCA和机器学习算法可以有效地应用,以确定能够在大量SNP标记中区分目标群体的最少标记数量的最佳标记组合。