Suppr超能文献

利用随机森林优先选择拟南芥中的候选 eQTL 因果基因。

Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS.

机构信息

Bioinformatics Group, Wageningen University and Research, 6708 PB Wageningen, The Netherlands.

出版信息

G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac255.

Abstract

Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantitative trait locus genes have been identified, and experimental validation is costly and laborious. A prioritization method could help speed up the identification of causal expression quantitative trait locus genes. This study extends the machine-learning-based QTG-Finder2 method for prioritizing candidate causal genes in phenotype quantitative trait loci to be used for expression quantitative trait loci by adding gene structure, protein interaction, and gene expression. Independent validation shows that the new algorithm can prioritize 16 out of 25 potential expression quantitative trait locus causal genes within the top 20% rank. Several new features are important in prioritizing causal expression quantitative trait locus genes, including the number of protein-protein interactions, unique domains, and introns. Overall, this study provides a foundation for developing computational methods to prioritize candidate expression quantitative trait locus causal genes. The prediction of all genes is available in the AraQTL workbench (https://www.bioinformatics.nl/AraQTL/) to support the identification of gene expression regulators in Arabidopsis.

摘要

表达数量性状基因座(eQTL)图谱分析已广泛应用于研究拟南芥基因表达的遗传调控。作为结果,大量的 eQTL 数据已在该模式植物中生成;然而,仅鉴定出少数因果 eQTL 基因,并且实验验证既昂贵又费力。优先级排序方法可以帮助加快因果 eQTL 基因的鉴定。本研究通过添加基因结构、蛋白质相互作用和基因表达,将基于机器学习的 QTG-Finder2 方法扩展用于表型数量性状基因座中候选因果基因的优先级排序,以用于表达数量性状基因座。独立验证表明,新算法可以在前 20%的排名中优先考虑 25 个潜在的 eQTL 因果基因中的 16 个。在优先考虑因果 eQTL 基因方面,几个新特征很重要,包括蛋白质-蛋白质相互作用的数量、独特的结构域和内含子。总体而言,本研究为开发用于优先考虑候选 eQTL 因果基因的计算方法提供了基础。所有基因的预测都可在 AraQTL 工作平台(https://www.bioinformatics.nl/AraQTL/)中获得,以支持在拟南芥中鉴定基因表达调控因子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2355/9635658/2e6519690b66/jkac255f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验