Unilever Centre for Molecular Science Informatics, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, UK.
J Cheminform. 2011 Oct 14;3(1):40. doi: 10.1186/1758-2946-3-40.
Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.
通过增加科学研究的基础数据的可用性和易用性,链接开放数据为改善各个领域的科学质量提供了巨大的机会。在化学领域,已发表文献中提供了大量的信息,但其中绝大多数都无法以机器可读的格式获取。本文讨论了从专利文献中提取和语义化化学反应的原型系统 PatentEye 的实现情况。从欧洲专利局(EPO)10 周的出版物中,从 667 篇专利文献中提取了 4444 个反应,对于确定反应物的种类和数量的精度为 78%,召回率为 64%,产物鉴定的准确率为 92%。此外,还捕获了作为产物特征化数据报告的 NMR 谱。