Gaulton Kyle J, Mohlke Karen L, Vision Todd J
Curriculum in Genetics and Molecular Biologly, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.
Bioinformatics. 2007 May 1;23(9):1132-40. doi: 10.1093/bioinformatics/btm001. Epub 2007 Jan 19.
Identification of the genetic variation underlying complex traits is challenging. The wealth of information publicly available about the biology of complex traits and the function of individual genes permits the development of informatics-assisted methods for the selection of candidate genes for these traits.
We have developed a computational system named CAESAR that ranks all annotated human genes as candidates for a complex trait by using ontologies to semantically map natural language descriptions of the trait with a variety of gene-centric information sources. In a test of its effectiveness, CAESAR successfully selected 7 out of 18 (39%) complex human trait susceptibility genes within the top 2% of ranked candidates genome-wide, a subset that represents roughly 1% of genes in the human genome and provides sufficient enrichment for an association study of several hundred human genes. This approach can be applied to any well-documented mono- or multi-factorial trait in any organism for which an annotated gene set exists.
CAESAR scripts and test data can be downloaded from http://visionlab.bio.unc.edu/caesar/
识别复杂性状背后的基因变异具有挑战性。关于复杂性状生物学和单个基因功能的大量公开信息使得开发信息学辅助方法来选择这些性状的候选基因成为可能。
我们开发了一个名为CAESAR的计算系统,该系统通过使用本体将性状的自然语言描述与各种以基因为中心的信息源进行语义映射,将所有注释的人类基因列为复杂性状的候选基因。在其有效性测试中,CAESAR在全基因组排名前2%的候选基因中成功选出了18个复杂人类性状易感性基因中的7个(39%),这一子集约占人类基因组中基因的1%,为数百个人类基因的关联研究提供了足够的富集。这种方法可应用于任何具有注释基因集的生物体中任何有充分记录的单因素或多因素性状。
CAESAR脚本和测试数据可从http://visionlab.bio.unc.edu/caesar/下载