Ayat Maryam, Domaratzki Mike
Lactanet, Sainte-Anne-deBellevue, QC, Canada.
Department of Computer Science, University of Western Ontario, London, ON, Canada.
Front Bioinform. 2022 Aug 31;2:960889. doi: 10.3389/fbinf.2022.960889. eCollection 2022.
Genomic selection, which predicts phenotypes such as yield and drought resistance in crops from high-density markers positioned throughout the genome of the varieties, is moving towards machine learning techniques to make predictions on complex traits that are controlled by several genes. In this paper, we consider sparse Bayesian learning and ensemble learning as a technique for genomic selection and ranking markers based on their relevance to a trait. We define and explore two different forms of the sparse Bayesian learning for predicting phenotypes and identifying the most influential markers of a trait, respectively. We apply our methods on a dataset, and analyse our results with respect to existing related works, trait heritability, as well as the accuracies obtained from linear and Gaussian kernel functions. We find that sparse Bayesian methods are not only competitive with other machine learning methods in predicting yeast growth in different environments, but are also capable of identifying the most important markers, including both positive and negative effects on the growth, from which biologists can get insight. This attribute can make our proposed ensemble of sparse Bayesian learners favourable in ranking markers based on their relevance to a trait.
基因组选择是根据分布在品种全基因组中的高密度标记来预测作物产量和抗旱性等表型,目前正朝着机器学习技术发展,以对由多个基因控制的复杂性状进行预测。在本文中,我们将稀疏贝叶斯学习和集成学习视为基因组选择以及基于标记与性状的相关性对其进行排序的一种技术。我们分别定义并探索了两种不同形式的稀疏贝叶斯学习,用于预测表型和识别性状中最具影响力的标记。我们将我们的方法应用于一个数据集,并针对现有相关工作、性状遗传力以及从线性和高斯核函数获得的准确性来分析我们的结果。我们发现,稀疏贝叶斯方法不仅在预测不同环境下酵母生长方面与其他机器学习方法具有竞争力,而且还能够识别最重要的标记,包括对生长具有正负效应的标记,生物学家可以从中获得见解。这一特性可以使我们提出的稀疏贝叶斯学习器集成在基于标记与性状的相关性对其进行排序方面具有优势。