The School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, UK.
The School of Biological Sciences, University of Aberdeen, Aberdeen, UK.
Mol Ecol Resour. 2022 Aug;22(6):2248-2261. doi: 10.1111/1755-0998.13611. Epub 2022 Apr 18.
The molecular characterization of complex behaviours is a challenging task as a range of different factors are often involved to produce the observed phenotype. An established approach is to look at the overall levels of expression of brain genes-or 'neurogenomics'-to select the best candidates that associate with patterns of interest. However, traditional neurogenomic analyses have some well-known limitations: above all, the usually limited number of biological replicates compared to the number of genes tested-known as the "curse of dimensionality." In this study we implemented a machine learning (ML) approach that can be used as a complement to more established methods of transcriptomic analyses. We tested three supervised learning algorithms (Random Forests, Lasso and Elastic net Regularized Generalized Linear Model, and Support Vector Machine) for their performance in the characterization of transcriptomic patterns and identification of genes associated with honeybee waggle dance. We then matched the results of these analyses with traditional outputs of differential gene expression analyses and identified two promising candidates for the neural regulation of the waggle dance: boss and hnRNP A1. Overall, our study demonstrates the application of ML to analyse transcriptomics data and identify candidate genes underlying social behaviour. This approach has great potential for application to a wide range of different scenarios in evolutionary ecology, when investigating the genomic basis for complex phenotypic traits, and can present some clear advantages compared to the established tools of gene expression analysis, making it a valuable complement for future studies.
对复杂行为进行分子特征分析是一项具有挑战性的任务,因为通常涉及多种不同的因素才能产生观察到的表型。一种已建立的方法是观察大脑基因的总体表达水平,即“神经基因组学”,以选择与感兴趣的模式相关的最佳候选基因。然而,传统的神经基因组学分析存在一些众所周知的局限性:最重要的是,与测试的基因数量相比,通常用于生物复制的数量有限,这被称为“维度诅咒”。在这项研究中,我们实施了一种机器学习(ML)方法,可以作为转录组分析更成熟方法的补充。我们测试了三种监督学习算法(随机森林、套索和弹性网正则化广义线性模型和支持向量机)在描述转录组模式和识别与蜜蜂摇摆舞相关基因方面的性能。然后,我们将这些分析的结果与传统的差异基因表达分析结果进行匹配,并确定了两个与摇摆舞神经调节相关的有前途的候选基因:boss 和 hnRNP A1。总的来说,我们的研究展示了 ML 应用于分析转录组数据和识别社交行为相关候选基因的应用。这种方法在进化生态学中调查复杂表型性状的基因组基础时,具有广泛应用的潜力,并且与基因表达分析的既定工具相比具有一些明显的优势,使其成为未来研究的有价值的补充。