Suppr超能文献

利用降维方法可视化太阳能电池库空间。

Visualization of Solar Cell Library Space by Dimensionality Reduction Methods.

机构信息

Department of Chemistry , Bar-Ilan University , Ramat-Gan 5290002 , Israel.

Department of Information Systems , College of Law & Business, Ramat-Gan , P.O. Box 852, Bnei Brak 5110801 , Israel.

出版信息

J Chem Inf Model. 2018 Dec 24;58(12):2428-2439. doi: 10.1021/acs.jcim.8b00552. Epub 2018 Dec 13.

Abstract

Visualizing high-dimensional data by projecting them into a two- or three-dimensional space is a popular approach in many scientific fields, including computer-aided drug design and cheminformatics. In contrast, dimensionality reduction techniques have been far less explored for materials informatics. Nevertheless, similar to their usefulness in analyzing the space of, e.g., drug-like molecules, such techniques could provide useful insights on materials space, including an intuitive grasp of the overall distribution of samples, the identification of interesting trends, including the formation of materials clusters and the presence of activity cliffs and outliers, and rational navigation through this space in the search for new materials. Here we present the first application of four dimensionality reduction techniques, namely, principal component analysis (PCA), kernel PCA, Isomap, and diffusion map, to visualize and analyze a part of the materials space populated by solar cells made of metal oxides. Solar cells in general and metal-oxide-based solar cells in particular hold the promise of contributing to the world's search for clean and affordable energy resources. With the exception of PCA, these methods have seldom been used to visualize chemistry space and almost never been used to visualize materials space. For this purpose, we integrated five metal-oxide-based solar cell libraries into a uniform database and subjected it to dimensionality reduction by all four methods, comparing their performances using various criteria such as maintaining the local environment of samples and the clustering structure in the low-dimensional space. We also looked at the number of outliers produced by each method and analyzed common outliers. We found that PCA performs best in terms of the ability to correctly maintain the local environment of samples, whereas Isomap does the best job of assigning class membership on the basis of the identities of nearest neighbors (i.e., it is the best classifier). We also found that many of the outliers identified by all of the methods could be rationalized. We suggest that the methods used in this work could be extended to study other types of solar cells, thereby setting the ground for further analysis of the photovoltaic (PV) space as well as other regions of materials space.

摘要

将高维数据投影到二维或三维空间中是许多科学领域(包括计算机辅助药物设计和化学信息学)中常用的方法。相比之下,降维技术在材料信息学中还远未得到充分探索。然而,与它们在分析类似空间(例如,类药性分子)方面的有用性相似,这些技术可以为材料空间提供有用的见解,包括直观地掌握样本的整体分布、识别有趣的趋势,包括材料簇的形成以及活性悬崖和异常值的存在,以及在寻找新材料时在这个空间中进行合理导航。在这里,我们首次将四种降维技术(主成分分析(PCA)、核 PCA、Isomap 和扩散映射)应用于可视化和分析由金属氧化物制成的太阳能电池的材料空间的一部分。太阳能电池一般来说,特别是基于金属氧化物的太阳能电池,有望为全球寻找清洁、经济实惠的能源资源做出贡献。除了 PCA 之外,这些方法很少用于可视化化学空间,几乎从未用于可视化材料空间。为此,我们将五个基于金属氧化物的太阳能电池库集成到一个统一的数据库中,并通过所有四种方法对其进行降维处理,使用各种标准(例如,保持样品的局部环境和低维空间中的聚类结构)比较它们的性能。我们还研究了每种方法产生的异常值的数量,并分析了常见的异常值。我们发现 PCA 在正确保持样品局部环境的能力方面表现最佳,而 Isomap 在基于最近邻的身份分配类别成员方面表现最佳(即它是最佳分类器)。我们还发现,所有方法识别的许多异常值都可以合理化。我们建议可以将本工作中使用的方法扩展到研究其他类型的太阳能电池,从而为进一步分析光伏(PV)空间以及材料空间的其他区域奠定基础。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验