Freda Philip J, Ghosh Attri, Zhang Elizabeth, Luo Tianhao, Chitre Apurva, Polesskaya Oksana, St Pierre Celine L, Gao Jianjun, Martin Connor D, Chen Hao, Garcia-Martinez Angel G, Wang Tengfei, Han Wenyan, Ishiwari Keita, Meyer Paul, Lamparelli Alexander, King Christopher P, Palmer Abraham A, Li Ruowang, Moore Jason H
bioRxiv. 2023 Jan 13:2023.01.12.523835. doi: 10.1101/2023.01.12.523835.
Quantitative Trait Locus (QTL) analysis and Genome-Wide Association Studies (GWAS) have the power to identify variants that capture significant levels of phenotypic variance in complex traits. However, effort and time are required to select the best methods and optimize parameters and pre-processing steps. Although machine learning approaches have been shown to greatly assist in optimization and data processing, applying them to QTL analysis and GWAS is challenging due to the complexity of large, heterogenous datasets. Here, we describe proof-of-concept for an automated machine learning approach, AutoQTL, with the ability to automate many complex decisions related to analysis of complex traits and generate diverse solutions to describe relationships that exist in genetic data.
Using a dataset of 18 putative QTL from a large-scale GWAS of body mass index in the laboratory rat, , AutoQTL captures the phenotypic variance explained under a standard additive model while also providing evidence of non-additive effects including deviations from additivity and 2-way epistatic interactions from simulated data via multiple optimal solutions. Additionally, feature importance metrics provide different insights into the inheritance models and predictive power of multiple GWAS-derived putative QTL.
This proof-of-concept illustrates that automated machine learning techniques can be applied to genetic data and has the potential to detect both additive and non-additive effects via various optimal solutions and feature importance metrics. In the future, we aim to expand AutoQTL to accommodate omics-level datasets with intelligent feature selection strategies.
数量性状基因座(QTL)分析和全基因组关联研究(GWAS)有能力识别在复杂性状中捕获显著表型变异水平的变异。然而,需要花费精力和时间来选择最佳方法、优化参数和预处理步骤。尽管机器学习方法已被证明能极大地辅助优化和数据处理,但由于大型异质数据集的复杂性,将其应用于QTL分析和GWAS具有挑战性。在此,我们描述了一种自动化机器学习方法AutoQTL的概念验证,它能够自动做出许多与复杂性状分析相关的复杂决策,并生成多种解决方案来描述遗传数据中存在的关系。
使用来自实验室大鼠体重指数大规模GWAS的18个假定QTL的数据集,AutoQTL捕获了标准加性模型下解释的表型变异,同时还通过多个最优解提供了非加性效应的证据,包括来自模拟数据的加性偏差和双向上位性相互作用。此外,特征重要性指标为多个GWAS衍生的假定QTL的遗传模型和预测能力提供了不同的见解。
这一概念验证表明,自动化机器学习技术可应用于遗传数据,并且有潜力通过各种最优解和特征重要性指标检测加性和非加性效应。未来,我们旨在扩展AutoQTL,以通过智能特征选择策略适应组学水平的数据集。