Curtis David
Academic Centre for Psychiatry, St Bartholomew's and Royal London School of Medicine and Dentistry, Royal London Hospital, Whitechapel, London, UK.
BMC Genet. 2007 Jul 18;8:49. doi: 10.1186/1471-2156-8-49.
Debate remains as to the optimal method for utilising genotype data obtained from multiple markers in case-control association studies. I and colleagues have previously described a method of association analysis using artificial neural networks (ANNs), whose performance compared favourably to single-marker methods. Here, the performance of ANN analysis is compared with other multi-marker methods, comprising different haplotype-based analyses and locus-based analyses.
Of several methods studied and applied to simulated SNP datasets, heterogeneity testing of estimated haplotype frequencies using asymptotic p values rather than permutation testing had the lowest power of the methods studied and ANN analysis had the highest power. The difference in power to detect association between these two methods was statistically significant (p = 0.001) but other comparisons between methods were not significant. The raw t statistic obtained from ANN analysis correlated highly with the empirical statistical significance obtained from permutation testing of the ANN results and with the p value obtained from the heterogeneity test.
Although ANN analysis was more powerful than the standard haplotype-based test it is unlikely to be taken up widely. The permutation testing necessary to obtain a valid p value makes it slow to perform and it is not underpinned by a theoretical model relating marker genotypes to disease phenotype. Nevertheless, the superior performance of this method does imply that the widely-used haplotype-based methods for detecting association with multiple markers are not optimal and efforts could be made to improve upon them. The fact that the t statistic obtained from ANN analysis is highly correlated with the statistical significance does suggest a possibility to use ANN analysis in situations where large numbers of markers have been genotyped, since the t value could be used as a proxy for the p value in preliminary analyses.
在病例对照关联研究中,关于利用从多个标记获得的基因型数据的最佳方法仍存在争议。我和同事之前描述了一种使用人工神经网络(ANN)进行关联分析的方法,其性能优于单标记方法。在此,将ANN分析的性能与其他多标记方法进行比较,这些方法包括不同的基于单倍型的分析和基于位点的分析。
在研究并应用于模拟SNP数据集的几种方法中,使用渐近p值而非置换检验对估计的单倍型频率进行异质性检验在所研究的方法中功效最低,而ANN分析功效最高。这两种方法在检测关联的功效上的差异具有统计学意义(p = 0.001),但方法之间的其他比较无显著差异。从ANN分析获得的原始t统计量与对ANN结果进行置换检验获得的经验统计显著性以及与从异质性检验获得的p值高度相关。
尽管ANN分析比基于单倍型的标准检验更具功效,但它不太可能被广泛采用。获得有效p值所需的置换检验使其执行速度缓慢,并且它没有基于将标记基因型与疾病表型相关联的理论模型。然而,该方法的优越性能确实意味着广泛使用的基于单倍型的多标记关联检测方法并非最优,可以努力对其进行改进。从ANN分析获得的t统计量与统计显著性高度相关这一事实确实表明,在对大量标记进行基因分型的情况下有可能使用ANN分析,因为在初步分析中t值可以用作p值的替代。