Thornblad Tobias A, Elliott Kate S, Jowett Jeremy, Visscher Peter M
Genetic Epidemiology, Queensland Institute of Medical Research, Brisbane, Australia.
Twin Res Hum Genet. 2007 Dec;10(6):861-70. doi: 10.1375/twin.10.6.861.
The prioritization of genes within a candidate genomic region is an important step in the identification of causal gene variants affecting complex traits. Surprisingly, there have been very few reports of bioinformatics tools to perform such prioritization. The purpose of this article is to investigate the performance of 3 positional candidate gene software tools available, PosMed, GeneSniffer and SUSPECTS. The comparison was made for 40, 20 and 10 Mb regions in the human genome centred around known susceptibility genes for the common diseases breast cancer, Crohn's disease, age-related macular degeneration and schizophrenia. The known susceptibility gene was not always ranked highly, or not ranked at all, by 1 or more of the software tools. There was a large variation between the 3 tools regarding which genes were prioritized, and their rank order. PosMed and GeneSniffer were most similar in their prioritization gene list, whereas SUSPECTS identified the same candidate genes only for the narrowest (10 Mb) regions. Combining 2 or all of the candidate gene finding tools was superior in terms of ranking positional candidates. It is possible to reduce the number of candidate genes from a starting set in a region of interest by combining a variety of candidate gene finding tools. Conversely, we recommend caution in relying solely on single positional candidate gene prioritization tools. Our results confirm the obvious, that is, that starting with a narrower positional region gives a higher likelihood that the true susceptibility gene is selected, and that it is ranked highly. A narrow confidence interval for the mapping of complex trait genes by linkage can be achieved by maximizing marker informativeness and by having large samples. Our results suggest that the best approach to classify a minimum set of candidate genes is to take those genes that are prioritized by multiple prioritization tools.
在影响复杂性状的致病基因变异鉴定过程中,对候选基因组区域内的基因进行优先级排序是重要的一步。令人惊讶的是,很少有关于执行此类优先级排序的生物信息学工具的报道。本文旨在研究三种可用的定位候选基因软件工具PosMed、GeneSniffer和SUSPECTS的性能。以人类基因组中围绕乳腺癌、克罗恩病、年龄相关性黄斑变性和精神分裂症等常见疾病的已知易感基因的40、20和10 Mb区域进行了比较。已知的易感基因并非总是被1种或更多软件工具高度排名或根本未被排名。这三种工具在哪些基因被优先排序及其排名顺序方面存在很大差异。PosMed和GeneSniffer在其优先排序基因列表中最为相似,而SUSPECTS仅在最窄(10 Mb)区域识别出相同的候选基因。结合两种或所有候选基因发现工具在对定位候选基因进行排名方面更具优势。通过结合多种候选基因发现工具,可以从感兴趣区域的起始基因集中减少候选基因的数量。相反,我们建议谨慎仅依赖单一的定位候选基因优先级排序工具。我们的结果证实了显而易见的事实,即从较窄的定位区域开始,选择真正的易感基因并将其高度排名具有更高的可能性。通过最大化标记信息性和拥有大样本,可以实现通过连锁定位复杂性状基因的窄置信区间。我们的结果表明,对最小候选基因集进行分类的最佳方法是选取那些被多种优先级排序工具优先排序的基因。