Suppr超能文献

肽库(PepBank)——一个基于序列文本挖掘和公共肽数据源的肽数据库。

PepBank--a database of peptides based on sequence text mining and public peptide data sources.

作者信息

Shtatland Timur, Guettler Daniel, Kossodo Misha, Pivovarov Misha, Weissleder Ralph

机构信息

Center for Molecular Imaging Research, Massachusetts General Hospital, Harvard Medical School, Bldg, 149, 13th Street, Room 5406, Charlestown, MA 02129, USA.

出版信息

BMC Bioinformatics. 2007 Aug 1;8:280. doi: 10.1186/1471-2105-8-280.

Abstract

BACKGROUND

Peptides are important molecules with diverse biological functions and biomedical uses. To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources.

DESCRIPTION

We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. The major source of peptide sequence data comes from text mining of MEDLINE abstracts. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. We show the utility of the database in different examples of affinity ligand discovery.

CONCLUSION

We have created and maintain a database of peptide sequences. The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. The database is freely available on http://pepbank.mgh.harvard.edu/, and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN (http://www.cpan.org/).

摘要

背景

肽是具有多种生物学功能和生物医学用途的重要分子。迄今为止,尚未存在一个可搜索的单一肽序列或相关生物学数据存档库。相反,肽序列仍需从摘要和全文文章中挖掘,和/或从零散的公共资源中获取。

描述

我们构建了一个新数据库(PepBank),在撰写本文时,该数据库共包含19,792个单独的肽条目。该数据库具有基于网络的用户界面,具备简单的类似谷歌的搜索功能、高级文本搜索以及BLAST和史密斯-沃特曼搜索功能。肽序列数据的主要来源是对MEDLINE摘要的文本挖掘。数据库的另一个组成部分是来自公共资源(ASPD和UniProt)的肽序列数据。数据库中另外一小部分是从全文文章集和文本挖掘结果中手动整理而来。我们在亲和力配体发现的不同示例中展示了该数据库的实用性。

结论

我们创建并维护了一个肽序列数据库。该数据库具有生物学和医学应用,例如,预测具有生物学意义的肽的结合伙伴、开发基于肽的治疗或诊断试剂,或预测噬菌体展示筛选产生的肽的分子靶点或结合特异性。该数据库可在http://pepbank.mgh.harvard.edu/上免费获取,文本挖掘源代码(Peptide::Pubmed)也可在上述网址以及CPAN(http://www.cpan.org/)上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc70/1976427/5f5c3709ad60/1471-2105-8-280-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验