Tchoua Roselyne B, Chard Kyle, Audus Debra, Qin Jian, de Pablo Juan, Foster Ian
Department of Computer Science, The University of Chicago, Chicago, IL, USA.
The Computation Institute, The University of Chicago and Argonne, Chicago, IL, USA.
Procedia Comput Sci. 2016;80:386-397. doi: 10.1016/j.procs.2016.05.338. Epub 2016 Jun 1.
A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.
每年发表的数百万篇研究文章中蕴含着大量有价值的数据。对于科学家来说,阅读并从这些文章中提取相关信息已成为一项难以完成的任务。这个问题阻碍了科学进步,因为很难在文献中埋藏的结果基础上进行拓展。此外,这些数据结构松散,以各种格式的手稿编码,嵌入不同的内容类型,并且一般来说机器无法访问。我们提出了一种人机混合解决方案,用于从文献中半自动提取科学事实。该解决方案将自动发现、下载和提取阶段与由学生组成的半专业人群相结合,以提取特定的科学事实。为了评估我们的方法,我们将其应用于一个具有挑战性的分子工程场景,即提取聚合物性质:弗洛里-哈金斯相互作用参数。我们展示了对聚合物性质综合数据库的有益贡献。