Suppr超能文献

LAILAPS搜索引擎:生命科学数据库中的相关性排名

The LAILAPS search engine: relevance ranking in life science databases.

作者信息

Lange Matthias, Spies Karl, Bargsten Joachim, Haberhauer Gregor, Klapperstück Matthias, Leps Michael, Weinel Christian, Wünschiers Röbbe, Weissbach Mandy, Stein Jens, Scholz Uwe

机构信息

Research Group Bioinformatics and Information Technology, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.

出版信息

J Integr Bioinform. 2010 Jan 15;7(2):110. doi: 10.2390/biecoll-jib-2010-110.

Abstract

Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.

摘要

搜索引擎和检索系统是生命科学桌面端常用的工具。人工检查反映生命科学概念或事实的数百条数据库条目是一项耗时的日常工作。在此,查询结果的数量并不重要,相关性才是关键。在本文中,我们展示了用于生命科学数据库的LAILAPS搜索引擎。其理念是将一种用于相关性排序的新颖特征模型、一种用于建模用户相关性概况的机器学习方法、通过跟踪用户反馈来改进排序以及一个直观且简洁的网络用户界面相结合,该界面通过跟踪用户交互来估计相关性排名。查询被表述为简单的关键词列表,并会通过同义词进行扩展。LAILAPS支持灵活的文本索引和简单的数据导入格式,既可以轻松用作综合集成生命科学数据库的搜索引擎,也可用于小型内部项目数据库。通过从每个数据库命中结果中提取的一组特征,并结合用户相关性偏好,神经网络可预测用户特定的相关性得分。利用专家知识作为预定义神经网络的训练数据,或者使用用户自己的相关性训练集,已实现了对数据库命中结果可靠的相关性排序。在本文中,我们展示了LAILAPS系统、相关概念、基准测试和用例。LAILAPS可在http://lailaps.ipk - gatersleben.de上公开获取SWISSPROT数据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验