Suppr超能文献

化学信息检索与文本挖掘技术。

Information Retrieval and Text Mining Technologies for Chemistry.

机构信息

Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain.

Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain.

出版信息

Chem Rev. 2017 Jun 28;117(12):7673-7761. doi: 10.1021/acs.chemrev.6b00851. Epub 2017 May 5.

Abstract

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.

摘要

高效获取科学文献、专利、技术报告或网络中包含的化学信息是不同化学领域的研究人员和专利律师的共同需求。在大多数情况下,检索重要的化学信息始于为特定化合物或化合物族找到相关文档。化学文献的有针对性检索与文本中化学实体的自动识别密切相关,这通常涉及提取文档中提到的所有化学品列表,包括任何相关信息。在这篇综述中,我们全面深入地描述了满足这些信息需求的基本概念、技术实现和当前技术。重点介绍了分别针对 BioCreative IV 和 V 的 CHEMDNER 和 CHEMDNER 专利任务的社区挑战,以解决系统性能问题。考虑到人们对构建自动注释化学知识库的兴趣日益浓厚,这些知识库将化学信息和生物数据整合在一起,因此还介绍了将提取的化学名称映射到化学结构并对其进行注释的化学信息学方法,以及将化学与生物信息联系起来的文本挖掘应用。最后,突出了未来趋势和当前挑战,并提出了该新兴领域研究的路线图建议。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验