Donaldson Ian, Martin Joel, de Bruijn Berry, Wolting Cheryl, Lay Vicki, Tuekam Brigitte, Zhang Shudong, Baskin Berivan, Bader Gary D, Michalickova Katerina, Pawson Tony, Hogue Christopher W V
Samuel Lunenfeld Research Institute, Toronto, M5G 1X5, Canada.
BMC Bioinformatics. 2003 Mar 27;4:11. doi: 10.1186/1471-2105-4-11.
BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.
背景:大多数经过实验验证的分子相互作用和生物途径数据存在于生物医学期刊文章的非结构化文本中,计算方法无法获取这些数据。生物分子相互作用网络数据库(BIND)旨在以机器可读格式捕获这些数据。我们假设,通过使用支持向量机技术首先在文献中定位相互作用信息,可以减少数据库回填这一艰巨任务的规模。我们提出了一个信息提取系统,该系统旨在在文献中定位蛋白质-蛋白质相互作用数据,并将这些数据呈现给编辑人员和公众以供审核并录入BIND。 结果:交叉验证估计,支持向量机对描述相互作用信息的摘要进行分类时,测试集的精确率、准确率和召回率分别为92%、90%和92%。我们估计该系统能够召回另一个酵母-蛋白质相互作用数据库中所有非高通量相互作用的60%。最后,该系统应用于一个实际的编目问题,发现其使用可将任务持续时间减少70%,从而节省176天。 结论:机器学习方法作为指导相互作用和途径数据库回填的工具很有用;然而,只有将这些技术与人工审核相结合并录入诸如BIND这样的事实数据库,才能实现这种潜力。此处描述的PreBIND系统可在http://bind.ca上向公众开放。当前功能允许搜索人类、小鼠和酵母的蛋白质相互作用信息。
Pac Symp Biocomput. 2008
Genome Inform. 2005
Int J Bioinform Res Appl. 2005
IEEE/ACM Trans Comput Biol Bioinform. 2007
Bioinformatics. 2004-3-22
BMC Bioinformatics. 2005-6-1
Comput Intell Neurosci. 2023-2-15
Brief Bioinform. 2021-5-20
BMC Bioinformatics. 2019-8-16
Database (Oxford). 2017-1-1
Bioinformatics. 2017-12-1
Database (Oxford). 2015-12-26
PLoS Comput Biol. 2015-12-9
Comput Biol Med. 2015-3
Database (Oxford). 2014-7-18
Nucleic Acids Res. 2003-1-1
BMC Bioinformatics. 2002-10-25
Nat Biotechnol. 2002-10
Genome Inform. 2001
Nucleic Acids Res. 2002-1-1
Nucleic Acids Res. 2002-1-1
Nucleic Acids Res. 2002-1-1
Methods Biochem Anal. 2001