Zhang Lei, He Mu
Institute of Advanced Materials and Flexible Electronics (IAMFE), School of Chemistry and Materials Science, Nanjing University of Information Science & Technology, 210044, Nanjing, People's Republic of China.
Department of Materials Physics, School of Chemistry and Materials Science, Nanjing University of Information Science & Technology, 210044, Nanjing, People's Republic of China.
J Phys Condens Matter. 2021 Dec 15;34(9). doi: 10.1088/1361-648X/ac3e1e.
Despite the significant advancement of the data-driven studies for physical science, the textual data that are numerous in the literature are not fully embraced by the physics and materials community. In this manuscript, we successfully employ the natural language processing (NLP) technique to unsupervisedly predict the existence of solar cell types including the dye-sensitized solar cells and the perovskite solar cells based on literatures published prior to their first discovery without human annotation. Enlightened by this, we further identify possible solar cell material candidates via NLP starting with a comprehensive training database of 3.2 million paper abstracts published before 2021. The NLP model effectively predicts the existing solar cell materials, while an uncommon solar cell material namely PtSeis suggested as an appropriate candidate for the future solar cells. Its optoelectronic properties are comprehensive investigated via first-principles calculations to reveal the decent stability and optoelectronic performance of the NLP-predicted candidate. This study demonstrates the viability of the textual data for the data-driven materials prediction and highlights the NLP method as a powerful tool to reliably predict the solar cell materials.
尽管数据驱动的物理科学研究取得了重大进展,但文献中大量的文本数据尚未被物理和材料学界充分利用。在本论文中,我们成功运用自然语言处理(NLP)技术,基于染料敏化太阳能电池和钙钛矿太阳能电池首次发现之前发表的文献,在无人工标注的情况下无监督地预测太阳能电池类型的存在。受此启发,我们从一个包含2021年之前发表的320万篇论文摘要的综合训练数据库开始,通过NLP进一步识别可能的太阳能电池材料候选物。NLP模型有效地预测了现有的太阳能电池材料,同时一种不常见的太阳能电池材料PtSe被建议作为未来太阳能电池的合适候选物。通过第一性原理计算全面研究了其光电性能,以揭示NLP预测候选物的良好稳定性和光电性能。本研究证明了文本数据用于数据驱动材料预测的可行性,并突出了NLP方法作为可靠预测太阳能电池材料的强大工具。