Tsuruoka Yoshimasa, Tsujii Jun'ichi, Ananiadou Sophia
School of Computer Science, The University of Manchester, Manchester, UK.
Bioinformatics. 2008 Nov 1;24(21):2559-60. doi: 10.1093/bioinformatics/btn469. Epub 2008 Sep 4.
FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) appearing in the documents retrieved by the query. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics. Unlike existing systems that provide similar functionality, FACTA pre-indexes not only the words but also the concepts mentioned in the documents, which enables the user to issue a flexible query (e.g. free keywords or Boolean combinations of keywords/concepts) and receive the results immediately even when the number of the documents that match the query is very large. The user can also view snippets from MEDLINE to get textual evidence of associations between the query terms and the concepts. The concept IDs and their names/synonyms for building the indexes were collected from several biomedical databases and thesauri, such as UniProt, BioThesaurus, UMLS, KEGG and DrugBank.
The system is available at http://www.nactem.ac.uk/software/facta/
FACTA是一个用于MEDLINE摘要的文本搜索引擎,其特别设计用于帮助用户浏览查询检索到的文档中出现的生物医学概念(如基因/蛋白质、疾病、酶和化合物)。这些概念以表格形式呈现给用户,并根据共现统计进行排序。与提供类似功能的现有系统不同,FACTA不仅对文档中出现的单词进行预索引,还对概念进行预索引,这使得用户能够发出灵活的查询(如自由关键词或关键词/概念的布尔组合),即使匹配查询的文档数量非常大,也能立即收到结果。用户还可以查看MEDLINE的片段,以获取查询词与概念之间关联的文本证据。用于构建索引的概念ID及其名称/同义词是从几个生物医学数据库和词库中收集的,如UniProt、BioThesaurus、UMLS、KEGG和DrugBank。