Suppr超能文献

在全文文章中重新排名 PPI 相互作用体和对的动态编程。

Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles.

机构信息

Department of Computer Science & Engineering, Yuan Ze University, Chung-Li, Taiwan, R.O.C.

出版信息

BMC Bioinformatics. 2011 Feb 23;12:60. doi: 10.1186/1471-2105-12-60.

Abstract

BACKGROUND

Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation.

RESULTS

Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM)-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R) by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference.

CONCLUSIONS

In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches.

摘要

背景

实验验证的蛋白质-蛋白质相互作用(PPIs)除非存储在 PPI 数据库中,否则研究人员很难检索到。通过使用文本挖掘系统来识别在 PPIs 中扮演相互作用者角色的基因,并将这些基因映射到唯一的数据库标识符(相互作用者归一化任务或 INT),然后为每篇文章返回一个相互作用对列表(相互作用对任务或 IPT),可以促进这些数据库的整理。这两个任务是根据内插精度/召回率(AUC iP/R)得分的曲线下面积来评估的,因为输出列表中标识符的顺序对于整理的便利性很重要。

结果

我们为 BioCreAtIvE II.5 INT 挑战赛开发的 INT 系统通过使用基于支持向量机(SVM)的排名过程,实现了有希望的 AUC iP/R 为 43.5%。使用我们的新重新排序算法,我们已经能够将系统性能(AUC iP/R)提高 1.84%。我们的实验结果还表明,使用重新排序的 INT 结果,我们的无监督 IPT 系统可以实现具有竞争力的 AUC iP/R 为 23.86%,比最佳 BC II.5 INT 系统高出 1.64%。与仅使用 SVM 排名的 INT 结果相比,使用重新排名的 INT 结果可将 AUC iP/R 提高 7.84%。统计学上显著的 t 检验结果表明,我们具有重新排序的 INT/IPT 系统的性能优于没有重新排序的系统,具有统计学上的显著差异。

结论

在本文中,我们提出了一种新的重新排序算法,该算法考虑了文章中标识符之间的共现,以改善 INT 和 IPT 的排名结果。将重新排序的 INT 结果与无监督方法相结合,以发现相互作用者之间的关联,所提出的方法可以提高 IPT 的性能。我们还实现了使用动态编程进行评分计算,这比传统方法更快、更有效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e1/3053256/d6ed852e06b1/1471-2105-12-60-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验