Suppr超能文献

AZuRE,一个用于基因和蛋白质名称自动消歧的可扩展系统。

AZuRE, a scalable system for automated term disambiguation of gene and protein names.

作者信息

Podowski Raf M, Cleary John G, Goncharoff Nicholas T, Amoutzias Gregory, Hayes William S

机构信息

AstraZeneca R&D Boston and Karolinska Institutet.

出版信息

Proc IEEE Comput Syst Bioinform Conf. 2004:415-24. doi: 10.1109/csb.2004.1332454.

Abstract

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.

摘要

由于缺乏标准的基因和蛋白质命名规范,研究人员在进行文献检索时往往要耗费很长时间,有时甚至徒劳无功。本文描述了一种系统,该系统能够在之前未见过的MEDLINE摘要中自动为基因指定其基因定位链接数据库标识(LLID)。该系统基于监督学习,为每个LLID构建一个模型。所有LLID的训练集均自动从基因定位链接数据库和瑞士蛋白质数据库中的MEDLINE参考文献中提取。对所有20546个具有LLID的人类基因的性能进行了验证。其中,7344个产生了高质量模型(F值>0.7,其中近60%大于0.9),13202个没有,主要是由于已知文献参考数量不足。对一组66个基因的MEDLINE文档进行人工验证,结果与系统的内部准确性评估结果高度一致。结论是,使用可扩展的自动化技术可以实现高质量的基因消歧。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验