对 13 种蛋白质-蛋白质相互作用提取核方法的详细错误分析。

A detailed error analysis of 13 kernel methods for protein-protein interaction extraction.

机构信息

Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, 10099 Berlin, Germany.

出版信息

BMC Bioinformatics. 2013 Jan 16;14:12. doi: 10.1186/1471-2105-14-12.

DOI:10.1186/1471-2105-14-12

PMID:23323857

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3680070/

Abstract

BACKGROUND

Kernel-based classification is the current state-of-the-art for extracting pairs of interacting proteins (PPIs) from free text. Various proposals have been put forward, which diverge especially in the specific kernel function, the type of input representation, and the feature sets. These proposals are regularly compared to each other regarding their overall performance on different gold standard corpora, but little is known about their respective performance on the instance level.

RESULTS

We report on a detailed analysis of the shared characteristics and the differences between 13 current methods using five PPI corpora. We identified a large number of rather difficult (misclassified by most methods) and easy (correctly classified by most methods) PPIs. We show that kernels using the same input representation perform similarly on these pairs and that building ensembles using dissimilar kernels leads to significant performance gain. However, our analysis also reveals that characteristics shared between difficult pairs are few, which lowers the hope that new methods, if built along the same line as current ones, will deliver breakthroughs in extraction performance.

CONCLUSIONS

Our experiments show that current methods do not seem to do very well in capturing the shared characteristics of positive PPI pairs, which must also be attributed to the heterogeneity of the (still very few) available corpora. Our analysis suggests that performance improvements shall be sought after rather in novel feature sets than in novel kernel functions.

摘要

背景

基于核的分类是从文本中提取相互作用蛋白质对（PPIs）的最新技术。已经提出了各种建议，特别是在特定核函数、输入表示类型和特征集方面存在差异。这些建议经常在不同的黄金标准语料库上比较它们的整体性能，但对于它们在实例级别上的各自性能知之甚少。

结果

我们使用五个 PPI 语料库报告了对当前 13 种方法的共享特征和差异的详细分析。我们确定了大量相当困难（大多数方法都错误分类）和容易（大多数方法都正确分类）的 PPIs。我们表明，使用相同输入表示的核在这些对上表现相似，而使用不同核构建集成可以显著提高性能。然而，我们的分析还表明，困难对之间共享的特征很少，这降低了新方法（如果沿着与当前方法相同的路线构建）在提取性能方面取得突破的希望。

结论

我们的实验表明，当前的方法似乎并不能很好地捕捉阳性 PPI 对的共享特征，这也归因于（仍然很少）可用语料库的异质性。我们的分析表明，性能改进应该更多地在新的特征集而不是新的核函数中寻找。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c855/3680070/0594c44ef666/1471-2105-14-12-1.jpg

相似文献

A detailed error analysis of 13 kernel methods for protein-protein interaction extraction.对 13 种蛋白质-蛋白质相互作用提取核方法的详细错误分析。

BMC Bioinformatics. 2013 Jan 16;14:12. doi: 10.1186/1471-2105-14-12.

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.从文献中提取蛋白质-蛋白质相互作用的核方法综合基准测试

PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837.

Neighborhood hash graph kernel for protein-protein interaction extraction.基于邻近哈希图核的蛋白质-蛋白质相互作用提取方法。

J Biomed Inform. 2011 Dec;44(6):1086-92. doi: 10.1016/j.jbi.2011.08.011. Epub 2011 Aug 23.

Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.用于从生物医学文献中提取蛋白质-蛋白质相互作用的分布式平滑树核

PLoS One. 2017 Nov 3;12(11):e0187379. doi: 10.1371/journal.pone.0187379. eCollection 2017.

Comparative analysis of five protein-protein interaction corpora.五个蛋白质-蛋白质相互作用语料库的比较分析。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-9-S3-S6.

Hash subgraph pairwise kernel for protein-protein interaction extraction.基于哈希子图的成对核函数用于蛋白质-蛋白质相互作用提取。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1190-202. doi: 10.1109/TCBB.2012.50.

Protein-protein interaction extraction by leveraging multiple kernels and parsers.利用多种内核和解析器进行蛋白质-蛋白质相互作用提取。

Int J Med Inform. 2009 Dec;78(12):e39-46. doi: 10.1016/j.ijmedinf.2009.04.010. Epub 2009 Jun 4.

Exploiting graph kernels for high performance biomedical relation extraction.利用图核进行高性能生物医学关系提取。

J Biomed Semantics. 2018 Jan 30;9(1):7. doi: 10.1186/s13326-017-0168-3.

Large-scale Protein-Protein Interaction prediction using novel kernel methods.使用新型核方法进行大规模蛋白质-蛋白质相互作用预测。

Int J Data Min Bioinform. 2008;2(2):145-56. doi: 10.1504/ijdmb.2008.019095.

Tree kernel-based protein-protein interaction extraction from biomedical literature.基于树核的生物医学文献中蛋白质-蛋白质相互作用提取。

J Biomed Inform. 2012 Jun;45(3):535-43. doi: 10.1016/j.jbi.2012.02.004. Epub 2012 Feb 25.

引用本文的文献

Label3DMaize: toolkit for 3D point cloud data annotation of maize shoots.Label3DMaize：用于玉米苗三维点云数据标注的工具包。

Gigascience. 2021 May 7;10(5). doi: 10.1093/gigascience/giab031.

PEDL: extracting protein-protein associations using deep language models and distant supervision.PEDL：使用深度语言模型和远程监督提取蛋白质-蛋白质关联。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i490-i498. doi: 10.1093/bioinformatics/btaa430.

Automated recognition of functional compound-protein relationships in literature.文献中功能化合物-蛋白质关系的自动识别。

PLoS One. 2020 Mar 3;15(3):e0220925. doi: 10.1371/journal.pone.0220925. eCollection 2020.

Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives.将癌症基因组学转化为人工智能导向的精准医学：应用、挑战和未来展望。

Hum Genet. 2019 Feb;138(2):109-124. doi: 10.1007/s00439-019-01970-5. Epub 2019 Jan 22.

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.PubMed便携式：支持文本挖掘应用开发的框架。

PLoS One. 2016 Oct 5;11(10):e0163794. doi: 10.1371/journal.pone.0163794. eCollection 2016.

Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features.通过评估由相关特征组成的组的贡献水平进行特征选择的蛋白质-蛋白质相互作用提取。

BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):246. doi: 10.1186/s12859-016-1100-z.

Extracting drug-enzyme relation from literature as evidence for drug drug interaction.从文献中提取药物-酶关系作为药物相互作用的证据。

J Biomed Semantics. 2016 Mar 7;7:11. doi: 10.1186/s13326-016-0052-6. eCollection 2016.

Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.用图算法弥合语义与句法——提取生物医学关系的研究现状

Brief Bioinform. 2017 Jan;18(1):160-178. doi: 10.1093/bib/bbw001. Epub 2016 Feb 5.

Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.用于从生物医学文本中发现基因相互作用及其上下文信息的序列模式挖掘

J Biomed Semantics. 2015 May 18;6:27. doi: 10.1186/s13326-015-0023-3. eCollection 2015.

本文引用的文献

Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

A hybrid approach to extract protein-protein interactions.一种混合方法来提取蛋白质-蛋白质相互作用。

Bioinformatics. 2011 Jan 15;27(2):259-65. doi: 10.1093/bioinformatics/btq620. Epub 2010 Nov 8.

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.从文献中提取蛋白质-蛋白质相互作用的核方法综合基准测试

PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837.

Walk-weighted subsequence kernels for protein-protein interaction extraction.基于行走权重的蛋白质相互作用提取子序列核方法。

BMC Bioinformatics. 2010 Feb 25;11:107. doi: 10.1186/1471-2105-11-107.

Event extraction with complex event classification using rich features.利用丰富特征进行复杂事件分类的事件抽取。

J Bioinform Comput Biol. 2010 Feb;8(1):131-46. doi: 10.1142/s0219720010004586.

Linguistic feature analysis for protein interaction extraction.语言特征分析在蛋白质相互作用提取中的应用。

BMC Bioinformatics. 2009 Nov 12;10:374. doi: 10.1186/1471-2105-10-374.

Literature-curated protein interaction datasets.文献整理的蛋白质相互作用数据集。

Nat Methods. 2009 Jan;6(1):39-46. doi: 10.1038/nmeth.1284.

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.用于蛋白质-蛋白质相互作用提取的全路径图核以及跨语料库学习评估

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-9-S11-S2.

Linking genes to literature: text mining, information extraction, and retrieval applications for biology.将基因与文献相联系：生物学的文本挖掘、信息提取及检索应用

Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1.

Comparative analysis of five protein-protein interaction corpora.五个蛋白质-蛋白质相互作用语料库的比较分析。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-9-S3-S6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

对 13 种蛋白质-蛋白质相互作用提取核方法的详细错误分析。

A detailed error analysis of 13 kernel methods for protein-protein interaction extraction.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献