Bioinformatics Group, Wageningen University and Research, 6708 PB Wageningen, The Netherlands.
G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac255.
Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantitative trait locus genes have been identified, and experimental validation is costly and laborious. A prioritization method could help speed up the identification of causal expression quantitative trait locus genes. This study extends the machine-learning-based QTG-Finder2 method for prioritizing candidate causal genes in phenotype quantitative trait loci to be used for expression quantitative trait loci by adding gene structure, protein interaction, and gene expression. Independent validation shows that the new algorithm can prioritize 16 out of 25 potential expression quantitative trait locus causal genes within the top 20% rank. Several new features are important in prioritizing causal expression quantitative trait locus genes, including the number of protein-protein interactions, unique domains, and introns. Overall, this study provides a foundation for developing computational methods to prioritize candidate expression quantitative trait locus causal genes. The prediction of all genes is available in the AraQTL workbench (https://www.bioinformatics.nl/AraQTL/) to support the identification of gene expression regulators in Arabidopsis.
表达数量性状基因座(eQTL)图谱分析已广泛应用于研究拟南芥基因表达的遗传调控。作为结果,大量的 eQTL 数据已在该模式植物中生成;然而,仅鉴定出少数因果 eQTL 基因,并且实验验证既昂贵又费力。优先级排序方法可以帮助加快因果 eQTL 基因的鉴定。本研究通过添加基因结构、蛋白质相互作用和基因表达,将基于机器学习的 QTG-Finder2 方法扩展用于表型数量性状基因座中候选因果基因的优先级排序,以用于表达数量性状基因座。独立验证表明,新算法可以在前 20%的排名中优先考虑 25 个潜在的 eQTL 因果基因中的 16 个。在优先考虑因果 eQTL 基因方面,几个新特征很重要,包括蛋白质-蛋白质相互作用的数量、独特的结构域和内含子。总体而言,本研究为开发用于优先考虑候选 eQTL 因果基因的计算方法提供了基础。所有基因的预测都可在 AraQTL 工作平台(https://www.bioinformatics.nl/AraQTL/)中获得,以支持在拟南芥中鉴定基因表达调控因子。