Suppr超能文献

免疫表位数据库的文档分类自动化

Automating document classification for the Immune Epitope Database.

作者信息

Wang Peng, Morgan Alexander A, Zhang Qing, Sette Alessandro, Peters Bjoern

机构信息

The La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037, USA.

出版信息

BMC Bioinformatics. 2007 Jul 26;8:269. doi: 10.1186/1471-2105-8-269.

Abstract

BACKGROUND

The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose.

RESULTS

We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) utilizing information stored in PubMed beyond the abstract itself b) applying standard feature selection criteria and c) extracting domain specific feature patterns that e.g. identify peptides sequences. We have implemented the classifier into the curation process determining if abstracts are clearly relevant, clearly irrelevant, or if no certain classification can be made, in which case the abstracts are manually classified. Testing this classification scheme on an independent dataset, we achieve 95% sensitivity and specificity in the 51.1% of abstracts that were automatically classified.

CONCLUSION

By implementing text classification, we have sped up the reference selection process without sacrificing sensitivity or specificity of the human expert classification. This study provides both practical recommendations for users of text classification tools, as well as a large dataset which can serve as a benchmark for tool developers.

摘要

背景

免疫表位数据库包含从科学文献中人工整理的免疫表位信息。与其他知识领域的类似项目一样,在确定哪些文章适用于此目的方面花费了大量精力。

结果

我们在此报告使用朴素贝叶斯分类器自动化此过程的经验,该分类器在由领域专家分类的20910篇摘要上进行训练。通过以下方式对基本分类器性能进行了改进:a)利用PubMed中存储的摘要本身之外的信息;b)应用标准特征选择标准;c)提取特定领域的特征模式,例如识别肽序列。我们已将该分类器应用于整理过程,以确定摘要是否明显相关、明显不相关,或者是否无法进行明确分类,在无法明确分类的情况下,摘要将由人工进行分类。在一个独立数据集上测试此分类方案时,我们在自动分类的51.1%的摘要中实现了95%的灵敏度和特异性。

结论

通过实施文本分类,我们加快了参考文献选择过程,同时不牺牲人类专家分类的灵敏度或特异性。本研究既为文本分类工具的用户提供了实用建议,也为工具开发者提供了一个可作为基准的大型数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/966d/1965490/4b576a542535/1471-2105-8-269-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验