Siddiqui Hamza, Usmani Tahsin
Organic PV Lab, Integral University, Lucknow 226026, India.
ACS Omega. 2024 Jul 31;9(32):34445-34455. doi: 10.1021/acsomega.4c02157. eCollection 2024 Aug 13.
To enhance the efficiency of organic solar cells, accurately predicting the efficiency of new pairs of donor and acceptor materials is crucial. Presently, most machine learning studies rely on regression models, which often struggle to establish clear rules for distinguishing between high- and low-performing donor-acceptor pairs. This study proposes a novel approach by integrating interpretable AI, specifically using Shapely values, with four supervised machine learning classification models, namely, support vector machines, decision trees, random forest, and gradient boosting. These models aim to identify high-efficiency donor-acceptor pairs based solely on chemical structures and to extract important features that establish general design principles for distinguishing between high- and low-efficiency pairs. For validation purposes, an unsupervised machine learning algorithm utilizing loading vectors obtained from the principal component analysis is employed to identify crucial features associated with high-efficiency donor-acceptor pairs. Interestingly, the features identified by the supervised machine learning approach were found to be a subset of those identified by the unsupervised method. Noteworthy features include the van der Waals surface area, partial equalization of orbital electronegativity, Moreau-Broto autocorrelation, and molecular substructures. Leveraging these features, a backward-working model can be developed, facilitating exploration across a wide array of materials used in organic solar cells. This innovative approach will help navigate the vast chemical compound space of donor and acceptor materials essential in creating high-efficiency organic solar cells.
为提高有机太阳能电池的效率,准确预测新的供体和受体材料对的效率至关重要。目前,大多数机器学习研究依赖回归模型,而这些模型往往难以建立区分高性能和低性能供体-受体对的明确规则。本研究提出了一种新方法,将可解释人工智能(具体使用Shapely值)与四种监督机器学习分类模型(即支持向量机、决策树、随机森林和梯度提升)相结合。这些模型旨在仅根据化学结构识别高效供体-受体对,并提取重要特征,从而建立区分高效和低效对的一般设计原则。为了进行验证,采用一种利用主成分分析获得的载荷向量的无监督机器学习算法来识别与高效供体-受体对相关的关键特征。有趣的是,发现监督机器学习方法识别的特征是无监督方法识别的特征的一个子集。值得注意的特征包括范德华表面积、轨道电负性的部分均衡、莫罗-布罗托自相关和分子子结构。利用这些特征,可以开发一个反向工作模型,便于在有机太阳能电池中使用的各种材料中进行探索。这种创新方法将有助于在创建高效有机太阳能电池所需的供体和受体材料的广阔化合物空间中导航。