Suppr超能文献

用于生物知识库半自动更新的文献分类

Literature classification for semi-automated updating of biological knowledgebases.

作者信息

Olsen Lars, Johan Kudahl Ulrich, Winther Ole, Brusic Vladimir

出版信息

BMC Genomics. 2013;14 Suppl 5(Suppl 5):S14. doi: 10.1186/1471-2164-14-S5-S14. Epub 2013 Oct 16.

Abstract

BACKGROUND

As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature.

RESULTS

We defined and applied a machine learning approach for literature classification to support updating of TANTIGEN, a knowledgebase of tumor T-cell antigens. Abstracts from PubMed were downloaded and classified as either "relevant" or "irrelevant" for database update. Training and five-fold cross-validation of a k-NN classifier on 310 abstracts yielded classification accuracy of 0.95, thus showing significant value in support of data extraction from the literature.

CONCLUSION

We here propose a conceptual framework for semi-automated extraction of epitope data embedded in scientific literature using principles from text mining and machine learning. The addition of such data will aid in the transition of biological databases to knowledgebases.

摘要

背景

随着生物学检测的输出在分辨率和数量上不断提高,诸如基因和蛋白质序列的功能注释等专业生物学数据主体,使得能够提取生物信息学实际应用所需的更高级知识。虽然常见类型的生物学数据,如序列数据,被广泛存储在生物数据库中,但功能注释,如免疫表位,主要以半结构化格式或嵌入原始科学文献中的自由文本形式存在。

结果

我们定义并应用了一种用于文献分类的机器学习方法,以支持更新肿瘤T细胞抗原知识库TANTIGEN。从PubMed下载摘要,并将其分类为对数据库更新“相关”或“不相关”。在310篇摘要上对k近邻分类器进行训练和五折交叉验证,分类准确率为0.95,从而显示出在支持从文献中提取数据方面的显著价值。

结论

我们在此提出一个概念框架,用于使用文本挖掘和机器学习原理从科学文献中半自动提取表位数据。添加此类数据将有助于生物数据库向知识库的转变。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d648/3852072/53b09f638f06/1471-2164-14-S5-S14-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验