Center for Human Genetics Research, Departments of Molecular Physiology & Biophysics and Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
BioData Min. 2010 Sep 27;3(1):5. doi: 10.1186/1756-0381-3-5.
Growing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability.
Stochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications in silico using simulated datasets.
We show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage.
We show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.
过去十年,随着发现影响疾病过程的遗传机制的兴趣日益浓厚和技术不断涌现,大量的遗传关联研究如雨后春笋般涌现,但遗传变异仅能解释部分高度研究的复杂性状的遗传率。非加性基因-基因相互作用(通常未被探索)被认为是遗传率缺失的一个来源。
采用进化算法的随机方法已证明能够检测和建模影响人类特征的基因-基因和基因-环境相互作用。在这里,我们展示了对 ATHENA(遗传和环境网络关联分析工具)中的神经网络算法的修改,这些修改导致发现影响人类特征的基因-基因相互作用的性能得到明显提高。我们采用了替代的基于树的交叉、神经网络权重的局部拟合的反向传播,以及可从公共可访问的生物数据库获得的领域知识,用于初始化基因-基因相互作用的搜索。我们使用模拟数据集进行了这些修改的模拟测试。
我们表明,替代的基于树的交叉修改导致 ATHENA 算法发现基因-基因相互作用的敏感性略有提高。当使用反向传播来局部拟合 NN 权重时,性能提高具有高度统计学意义。我们还表明,使用领域知识初始化基因-基因相互作用的搜索可以大大提高性能,尤其是在搜索空间大于搜索范围时。
我们表明,混合优化程序、替代交叉策略和从公共可用生物数据库中获取领域知识可以显著提高 ATHENA 算法检测和建模影响复杂人类特征的基因-基因相互作用的敏感性和性能。