一种用于选择复杂人类性状候选基因的计算系统。

A computational system to select candidate genes for complex human traits.

作者信息

Gaulton Kyle J, Mohlke Karen L, Vision Todd J

机构信息

Curriculum in Genetics and Molecular Biologly, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA.

出版信息

Bioinformatics. 2007 May 1;23(9):1132-40. doi: 10.1093/bioinformatics/btm001. Epub 2007 Jan 19.

DOI:10.1093/bioinformatics/btm001

PMID:17237041

Abstract

MOTIVATION

Identification of the genetic variation underlying complex traits is challenging. The wealth of information publicly available about the biology of complex traits and the function of individual genes permits the development of informatics-assisted methods for the selection of candidate genes for these traits.

RESULTS

We have developed a computational system named CAESAR that ranks all annotated human genes as candidates for a complex trait by using ontologies to semantically map natural language descriptions of the trait with a variety of gene-centric information sources. In a test of its effectiveness, CAESAR successfully selected 7 out of 18 (39%) complex human trait susceptibility genes within the top 2% of ranked candidates genome-wide, a subset that represents roughly 1% of genes in the human genome and provides sufficient enrichment for an association study of several hundred human genes. This approach can be applied to any well-documented mono- or multi-factorial trait in any organism for which an annotated gene set exists.

AVAILABILITY

CAESAR scripts and test data can be downloaded from http://visionlab.bio.unc.edu/caesar/

摘要

动机

识别复杂性状背后的基因变异具有挑战性。关于复杂性状生物学和单个基因功能的大量公开信息使得开发信息学辅助方法来选择这些性状的候选基因成为可能。

结果

我们开发了一个名为CAESAR的计算系统，该系统通过使用本体将性状的自然语言描述与各种以基因为中心的信息源进行语义映射，将所有注释的人类基因列为复杂性状的候选基因。在其有效性测试中，CAESAR在全基因组排名前2%的候选基因中成功选出了18个复杂人类性状易感性基因中的7个（39%），这一子集约占人类基因组中基因的1%，为数百个人类基因的关联研究提供了足够的富集。这种方法可应用于任何具有注释基因集的生物体中任何有充分记录的单因素或多因素性状。

可用性

CAESAR脚本和测试数据可从http://visionlab.bio.unc.edu/caesar/下载

相似文献

A computational system to select candidate genes for complex human traits.

Bioinformatics. 2007 May 1;23(9):1132-40. doi: 10.1093/bioinformatics/btm001. Epub 2007 Jan 19.

TraitMap: an XML-based genetic-map database combining multigenic loci and biomolecular networks.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i152-60. doi: 10.1093/bioinformatics/bth940.

Inter-species normalization of gene mentions with GNAT.

Bioinformatics. 2008 Aug 15;24(16):i126-132. doi: 10.1093/bioinformatics/btn299.

Gene symbol disambiguation using knowledge-based profiles.

Bioinformatics. 2007 Apr 15;23(8):1015-22. doi: 10.1093/bioinformatics/btm056. Epub 2007 Feb 21.

Fast parsers for Entrez Gene.

Bioinformatics. 2005 Jul 15;21(14):3189-90. doi: 10.1093/bioinformatics/bti488. Epub 2005 May 6.

RelEx--relation extraction using dependency parse trees.

Bioinformatics. 2007 Feb 1;23(3):365-71. doi: 10.1093/bioinformatics/btl616. Epub 2006 Dec 1.

SCA db: spinocerebellar ataxia candidate gene database.

Bioinformatics. 2004 Nov 1;20(16):2656-61. doi: 10.1093/bioinformatics/bth305. Epub 2004 Jun 24.

Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

Bioinformatics. 2005 Apr 15;21(8):1653-8. doi: 10.1093/bioinformatics/bti165. Epub 2004 Nov 25.

Identification of Parkinson's disease candidate genes using CAESAR and screening of MAPT and SNCAIP in South African Parkinson's disease patients.

J Neural Transm (Vienna). 2011 Jun;118(6):889-97. doi: 10.1007/s00702-011-0591-z. Epub 2011 Feb 23.

Comparison of character-level and part of speech features for name recognition in biomedical texts.

J Biomed Inform. 2004 Dec;37(6):423-35. doi: 10.1016/j.jbi.2004.08.008.

引用本文的文献

Text mining in cancer gene and pathway prioritization.

Cancer Inform. 2014 Oct 13;13(Suppl 1):69-79. doi: 10.4137/CIN.S13874. eCollection 2014.

Approaches for recognizing disease genes based on network.

Biomed Res Int. 2014;2014:416323. doi: 10.1155/2014/416323. Epub 2014 Feb 24.

ProphNet: a generic prioritization method through propagation of information.

BMC Bioinformatics. 2014;15 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-15-S1-S5. Epub 2014 Jan 10.

Integrating human omics data to prioritize candidate genes.

BMC Med Genomics. 2013 Dec 18;6:57. doi: 10.1186/1755-8794-6-57.

Chapter 15: disease gene prioritization.

PLoS Comput Biol. 2013 Apr;9(4):e1002902. doi: 10.1371/journal.pcbi.1002902. Epub 2013 Apr 25.

Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks.

PLoS Comput Biol. 2012;8(9):e1002690. doi: 10.1371/journal.pcbi.1002690. Epub 2012 Sep 27.

Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles.

Genome Med. 2012 Sep 28;4(9):75. doi: 10.1186/gm376. eCollection 2012.

Candidate gene prioritization.

Mol Genet Genomics. 2012 Sep;287(9):679-98. doi: 10.1007/s00438-012-0710-z. Epub 2012 Aug 15.

Constructing a gene semantic similarity network for the inference of disease genes.

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S2. doi: 10.1186/1752-0509-5-S2-S2. Epub 2011 Dec 14.

GPSy: a cross-species gene prioritization system for conserved biological processes--application in male gamete development.

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W458-65. doi: 10.1093/nar/gks380. Epub 2012 May 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于选择复杂人类性状候选基因的计算系统。

A computational system to select candidate genes for complex human traits.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献