Shetty Pranav, Ramprasad Rampi
School of Computational Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, GA 30332, USA.
School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta, GA 30332, USA.
iScience. 2020 Dec 10;24(1):101922. doi: 10.1016/j.isci.2020.101922. eCollection 2021 Jan 22.
Materials science literature has grown exponentially in recent years making it difficult for individuals to master all of this information. This constrains the formulation of new hypotheses that scientists can come up with. In this work, we explore whether materials science knowledge can be automatically inferred from textual information contained in journal papers. Using a data set of 0.5 million polymer papers, we show, using natural language processing methods that vector representations trained for every word in our corpus can indeed capture this knowledge in a completely unsupervised manner. We perform time-based studies through which we track popularity of various polymers for different applications and predict new polymers for novel applications based solely on the domain knowledge contained in our data set. Using co-relations detected automatically from literature in this manner thus, opens up a new paradigm for materials discovery.
近年来,材料科学文献呈指数级增长,个人难以掌握所有这些信息。这限制了科学家能够提出的新假设的形成。在这项工作中,我们探讨了是否可以从期刊论文中包含的文本信息自动推断材料科学知识。使用一个包含50万篇聚合物论文的数据集,我们表明,使用自然语言处理方法,为语料库中的每个单词训练的向量表示确实可以以完全无监督的方式捕捉这些知识。我们进行了基于时间的研究,通过这些研究,我们跟踪各种聚合物在不同应用中的受欢迎程度,并仅根据数据集中包含的领域知识预测用于新应用的新型聚合物。以这种方式从文献中自动检测到的共关系,为材料发现开辟了一个新的范式。