Caufield J Harry, Ping Peipei
The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A.
Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, U.S.A.
Emerg Top Life Sci. 2019 Aug 16;3(4):357-369. doi: 10.1042/ETLS20190003.
Protein-protein interactions, or PPIs, constitute a basic unit of our understanding of protein function. Though substantial effort has been made to organize PPI knowledge into structured databases, maintenance of these resources requires careful manual curation. Even then, many PPIs remain uncurated within unstructured text data. Extracting PPIs from experimental research supports assembly of PPI networks and highlights relationships crucial to elucidating protein functions. Isolating specific protein-protein relationships from numerous documents is technically demanding by both manual and automated means. Recent advances in the design of these methods have leveraged emerging computational developments and have demonstrated impressive results on test datasets. In this review, we discuss recent developments in PPI extraction from unstructured biomedical text. We explore the historical context of these developments, recent strategies for integrating and comparing PPI data, and their application to advancing the understanding of protein function. Finally, we describe the challenges facing the application of PPI mining to the text concerning protein families, using the multifunctional 14-3-3 protein family as an example.
蛋白质-蛋白质相互作用(PPIs)是我们理解蛋白质功能的基本单元。尽管已经付出了巨大努力将PPIs知识整理到结构化数据库中,但维护这些资源需要仔细的人工编目。即便如此,许多PPIs在非结构化文本数据中仍未得到编目。从实验研究中提取PPIs有助于构建PPI网络,并突出对于阐明蛋白质功能至关重要的关系。通过人工和自动化手段从众多文档中分离特定的蛋白质-蛋白质关系在技术上都具有挑战性。这些方法设计上的最新进展利用了新兴的计算技术,并在测试数据集上取得了令人瞩目的成果。在本综述中,我们讨论了从非结构化生物医学文本中提取PPIs的最新进展。我们探讨了这些进展的历史背景、整合和比较PPI数据的最新策略,以及它们在推进对蛋白质功能理解方面的应用。最后,我们以多功能的14-3-3蛋白质家族为例,描述了将PPI挖掘应用于蛋白质家族文本时面临的挑战。