PreBIND和Textomy——使用支持向量机挖掘生物医学文献中的蛋白质-蛋白质相互作用。

BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.

背景：大多数经过实验验证的分子相互作用和生物途径数据存在于生物医学期刊文章的非结构化文本中，计算方法无法获取这些数据。生物分子相互作用网络数据库（BIND）旨在以机器可读格式捕获这些数据。我们假设，通过使用支持向量机技术首先在文献中定位相互作用信息，可以减少数据库回填这一艰巨任务的规模。我们提出了一个信息提取系统，该系统旨在在文献中定位蛋白质-蛋白质相互作用数据，并将这些数据呈现给编辑人员和公众以供审核并录入BIND。结果：交叉验证估计，支持向量机对描述相互作用信息的摘要进行分类时，测试集的精确率、准确率和召回率分别为92%、90%和92%。我们估计该系统能够召回另一个酵母-蛋白质相互作用数据库中所有非高通量相互作用的60%。最后，该系统应用于一个实际的编目问题，发现其使用可将任务持续时间减少70%，从而节省176天。结论：机器学习方法作为指导相互作用和途径数据库回填的工具很有用；然而，只有将这些技术与人工审核相结合并录入诸如BIND这样的事实数据库，才能实现这种潜力。此处描述的PreBIND系统可在http://bind.ca上向公众开放。当前功能允许搜索人类、小鼠和酵母的蛋白质相互作用信息。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具