Suppr超能文献

利用FACTS推断理化学研究所小鼠全长cDNA克隆的更高功能信息。

Inferring higher functional information for RIKEN mouse full-length cDNA clones with FACTS.

作者信息

Nagashima Takeshi, Silva Diego G, Petrovsky Nikolai, Socha Luis A, Suzuki Harukazu, Saito Rintaro, Kasukawa Takeya, Kurochkin Igor V, Konagaya Akihiko, Schönbach Christian

机构信息

Biomedical Knowledge Discovery Team, Bioinformatics Group, RIKEN Genomic Sciences Center (GSC), Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.

出版信息

Genome Res. 2003 Jun;13(6B):1520-33. doi: 10.1101/gr.1019903.

Abstract

FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies).

摘要

FACTS(来自文本/序列来源的cDNA克隆功能关联/注释)是一个半自动化的知识发现与注释系统,它将从序列分析结果(序列推断)中获得的分子功能信息与从文本中提取的功能信息整合在一起。文本推断信息是从基于关键词检索的MEDLINE摘要中提取的,并且通过将基因或蛋白质名称与OMIM、BIND和DIP数据库条目进行匹配来获取。使用FACTS,我们发现60770个RIKEN小鼠cDNA FANTOM2克隆注释中有47.5%对文本搜索具有参考价值。MEDLINE查询为23.1%的克隆生成了包含分子相互作用的句子。当疾病医学主题词(MeSH)和基因本体(GO)术语与检索到的摘要进行匹配时,22.7%的克隆与潜在疾病相关,32.5%的克隆与GO标识符相关。还发现相当数量(23.5%)的与疾病MeSH相关的克隆也与遗传性疾病相关(OMIM疾病图谱)。推断的肿瘤疾病和神经系统疾病分别占与疾病MeSH相关克隆的49.6%和36.0%。基于序列的GO注释与基于文本的参考性GO注释的比较表明,对于78.2%的克隆,两种方法为该克隆提供了相同的GO注释,而对于21.8%的克隆,注释有所不同。相比之下,对于OMIM注释,只有28.5%的克隆具有相同的基于序列和基于文本的OMIM注释。基于序列、句子和术语的功能关联包含在FACTS数据库(http://facts.gsc.riken.go.jp/)中,该数据库允许通过可通过网络访问的关键词和序列搜索界面来注释和探索结果。FACTS数据库将成为研究小鼠转录组、cDNA推断的相互作用组(分子相互作用)和病理组(病理学)功能复杂性的关键工具。

相似文献

1
Inferring higher functional information for RIKEN mouse full-length cDNA clones with FACTS.
Genome Res. 2003 Jun;13(6B):1520-33. doi: 10.1101/gr.1019903.
2
FANTOM DB: database of Functional Annotation of RIKEN Mouse cDNA Clones.
Nucleic Acids Res. 2002 Jan 1;30(1):116-8. doi: 10.1093/nar/30.1.116.
3
Development and evaluation of an automated annotation pipeline and cDNA annotation system.
Genome Res. 2003 Jun;13(6B):1542-51. doi: 10.1101/gr.992803.
6
FREP: a database of functional repeats in mouse cDNAs.
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D471-5. doi: 10.1093/nar/gkh123.
7
RIKEN mouse genome encyclopedia.
Mech Ageing Dev. 2003 Jan;124(1):93-102. doi: 10.1016/s0047-6374(02)00173-2.
8
READ: RIKEN Expression Array Database.
Nucleic Acids Res. 2002 Jan 1;30(1):211-3. doi: 10.1093/nar/30.1.211.
9
Systematic expression profiling of the mouse transcriptome using RIKEN cDNA microarrays.
Genome Res. 2003 Jun;13(6B):1318-23. doi: 10.1101/gr.1075103.
10
EBIMed--text crunching to gather facts for proteins from Medline.
Bioinformatics. 2007 Jan 15;23(2):e237-44. doi: 10.1093/bioinformatics/btl302.

本文引用的文献

2
Connecting sequence and biology in the laboratory mouse.
Genome Res. 2003 Jun;13(6B):1505-19. doi: 10.1101/gr.991003.
6
Initial sequencing and comparative analysis of the mouse genome.
Nature. 2002 Dec 5;420(6915):520-62. doi: 10.1038/nature01262.
7
Association of genes to genetically inherited diseases using data mining.
Nat Genet. 2002 Jul;31(3):316-9. doi: 10.1038/ng895. Epub 2002 May 13.
8
Exploration of novel motifs derived from mouse cDNA sequences.
Genome Res. 2002 Mar;12(3):367-78. doi: 10.1101/gr.193702.
9
Mining functional information associated with expression arrays.
Funct Integr Genomics. 2001 Mar;1(4):256-68. doi: 10.1007/s101420000036.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验