Suppr超能文献

聚合物-富勒烯有机太阳能电池的性能预测及新型聚合物的数据挖掘辅助设计

Performance prediction of polymer-fullerene organic solar cells and data mining-assisted designing of new polymers.

作者信息

Xiao Fei, Saqib Muhammad, Razzaq Soha, Mubashir Tayyaba, Tahir Mudassir Hussain, Moussa Ihab Mohamed, El-Ansary Hosam O

机构信息

College of Computer Science, Huanggang Normal University, Huanggang, 438000, Hubei, China.

Institute of Chemistry, Khwaja Fareed University of Engineering & Information Technology, Rahim Yar Khan, 64200, Pakistan.

出版信息

J Mol Model. 2023 Aug 2;29(8):270. doi: 10.1007/s00894-023-05677-3.

Abstract

CONTEXT

Selecting high performance polymer materials for organic solar cells (OSCs) remains a compelling goal to improve device morphology, stability, and efficiency. To achieve these goals, machine learning has been reported as a powerful set of algorithms/techniques to solve complex problems and help/guide exploratory researchers to screen, map, and develop high performance materials. In present work, we have applied machine learning tools to screen data from reported studies and designed new polymer acceptor materials, respectively. Quantitative structure-activity relationship (QSAR) models were generated using machine learning-assisted simulation techniques. For this purpose, 3000 molecular descriptors are generated. Consequently, molecular descriptors having key effect on power conversion efficiency (PCE) were identified. Moreover, numerous regression models (e.g., random forest and bagging regressor models) were developed to predict the PCE. In particular, new materials were designed based on the similarity analysis. The GDB17 chemical database consisting of 166 million organic molecules in an ordered form is used for performing similarity analysis. A similarity behavior between GDB17 materials and the materials reported in literature is studied using RDKit (a cheminformatics software). Noteworthily, 100 monomers proved to be unique and effective, and PCEs of these monomers are predicted. Among these monomers, four monomers exhibited PCE higher than 14%, which is better than various reported studies. Our methodology provides a unique, time- and cost-efficient approach to screening and designing new polymers for OSCs using similarity analysis without revisiting the reported studies.

METHODS

To perform machine learning analysis, data from reported studies and online databases was collected. Different molecular descriptors were generated for polymer materials utilizing Dragon software. 3D structures of studied molecules were applied as input (SDF; structure data file format). Importantly, about 3000 molecular descriptors were generated. Comma-separated value (.csv) file format was used to export these molecular descriptors. To shortlist best descriptors, univariate regression analysis was performed. These descriptors were further utilized for training machine learning models. Moreover, necessary packages of Python for data analysis and visualization were imported such as Matplotlib, Numpy, Pandas, Scikit-learn, Seaborn, and Scipy. Random forest and bagging regressor models were applied for performing machine learning analysis. A cheminformatics software, RDKit, was applied for similarity analysis.

摘要

背景

为有机太阳能电池(OSC)选择高性能聚合物材料仍然是改善器件形态、稳定性和效率的一个迫切目标。为实现这些目标,机器学习已被报道为一套强大的算法/技术,用于解决复杂问题并帮助/指导探索性研究人员筛选、绘制和开发高性能材料。在当前工作中,我们分别应用机器学习工具筛选已报道研究的数据并设计新型聚合物受体材料。使用机器学习辅助模拟技术生成定量构效关系(QSAR)模型。为此,生成了3000个分子描述符。因此,确定了对功率转换效率(PCE)有关键影响的分子描述符。此外,还开发了许多回归模型(例如随机森林和装袋回归模型)来预测PCE。特别是,基于相似性分析设计了新材料。使用由1.66亿个有机分子按顺序排列组成的GDB17化学数据库进行相似性分析。使用RDKit(一种化学信息学软件)研究GDB17材料与文献中报道的材料之间的相似行为。值得注意的是,100种单体被证明是独特且有效的,并预测了这些单体的PCE。在这些单体中,有四种单体的PCE高于14%,这优于各种已报道的研究。我们的方法提供了一种独特的、节省时间和成本的方法,即使用相似性分析来筛选和设计用于OSC的新型聚合物,而无需回顾已报道的研究。

方法

为进行机器学习分析,收集了已报道研究和在线数据库的数据。利用Dragon软件为聚合物材料生成不同的分子描述符。将所研究分子的3D结构作为输入(SDF;结构数据文件格式)。重要的是,生成了约3000个分子描述符。使用逗号分隔值(.csv)文件格式导出这些分子描述符。为筛选出最佳描述符,进行了单变量回归分析。这些描述符进一步用于训练机器学习模型。此外还导入了用于数据分析和可视化的必要Python包,如Matplotlib、Numpy、Pandas、Scikit-learn、Seaborn和Scipy。应用随机森林和装袋回归模型进行机器学习分析。使用化学信息学软件RDKit进行相似性分析。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验