Öztürk Hakime, Özgür Arzucan, Schwaller Philippe, Laino Teodoro, Ozkirimli Elif
Department of Computer Engineering, Bogazici University, Istanbul, Turkey.
IBM Research - Zurich, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland.
Drug Discov Today. 2020 Apr;25(4):689-705. doi: 10.1016/j.drudis.2020.01.020. Epub 2020 Feb 3.
Text-based representations of chemicals and proteins can be thought of as unstructured languages codified by humans to describe domain-specific knowledge. Advances in natural language processing (NLP) methodologies in the processing of spoken languages accelerated the application of NLP to elucidate hidden knowledge in textual representations of these biochemical entities and then use it to construct models to predict molecular properties or to design novel molecules. This review outlines the impact made by these advances on drug discovery and aims to further the dialogue between medicinal chemists and computer scientists.
化学物质和蛋白质的文本表示可以被视为人类编纂的非结构化语言,用于描述特定领域的知识。自然语言处理(NLP)方法在处理口语方面的进展加速了NLP在阐明这些生化实体文本表示中的隐藏知识方面的应用,然后利用这些知识构建模型来预测分子性质或设计新型分子。本综述概述了这些进展对药物发现的影响,旨在促进药物化学家与计算机科学家之间的对话。