Okyay Tugba Muhlise, Yilmaz Ibrahim, Koldas Macit
Medical Biochemistry, University of Health Sciences, 34956, Istanbul, Türkiye.
Department of Medical Biochemistry, Health Science University Istanbul Haseki Training and Research Hospital, Istanbul, Türkiye.
Photochem Photobiol Sci. 2025 May 15. doi: 10.1007/s43630-025-00733-8.
Understanding the relationship between molecular structure and bioactivity is crucial for optimizing porphyrin-based therapeutics. By integrating cheminformatics techniques with machine learning models, our work enables the efficient classification of compounds based on their molecular structures and their growth inhibition capabilities (IC). A dataset of 317 porphyrin derivatives was compiled, incorporating molecular descriptors and biological activity data. Descriptive statistical analysis was performed to examine compound distribution and key features. Clustering analysis was conducted using hierarchical clustering and fingerprint similarity matrices to classify compounds based on structural similarity. Lipinski's Rule of Five was applied to assess drug-likeness, while Murcko scaffold analysis identified core structural patterns. Tumor response data were analyzed to evaluate therapeutic efficacy. Machine learning models were implemented to predict bioactivity. Descriptive statistics highlighted bioactive compounds, with TMPyP4 and Temaporfin being the most studied. Quantitative estimation of drug-likeness and the number of aliphatic carboxylic acids were identified as the most influential descriptors among others for bioactivity. Hierarchical clustering segmented porphyrins into nine structural groups. The analysis identified 168 pIC active compounds, with 31 meeting Lipinski's criteria, and 11 overlapping as both effective and bioavailable. Tumor response analysis revealed three porphyrins achieving 100% response. Logistic Regression emerged as the best-performing model, achieving 83% accuracy, demonstrating robust predictive capabilities. This study successfully characterized porphyrin derivatives, reviewing key molecular features influencing bioactivity and evaluating their therapeutic potential. It highlights the potential of machine learning in predicting the biological activity status of porphyrin derivatives.
了解分子结构与生物活性之间的关系对于优化基于卟啉的疗法至关重要。通过将化学信息学技术与机器学习模型相结合,我们的工作能够根据化合物的分子结构及其生长抑制能力(IC)对其进行高效分类。汇编了一个包含317种卟啉衍生物的数据集,纳入了分子描述符和生物活性数据。进行了描述性统计分析以检查化合物分布和关键特征。使用层次聚类和指纹相似性矩阵进行聚类分析,以根据结构相似性对化合物进行分类。应用Lipinski五规则评估药物相似性,而Murcko支架分析确定了核心结构模式。分析肿瘤反应数据以评估治疗效果。实施机器学习模型来预测生物活性。描述性统计突出了生物活性化合物,其中TMPyP4和替莫泊芬是研究最多的。药物相似性的定量估计和脂肪族羧酸的数量被确定为对生物活性有影响的最主要描述符。层次聚类将卟啉分为九个结构组。分析确定了168种具有pIC活性的化合物,其中31种符合Lipinski标准,11种兼具有效性和生物利用度。肿瘤反应分析显示三种卟啉实现了100%的反应。逻辑回归成为表现最佳的模型,准确率达到83%,显示出强大的预测能力。本研究成功地对卟啉衍生物进行了表征,回顾了影响生物活性的关键分子特征并评估了它们的治疗潜力。它突出了机器学习在预测卟啉衍生物生物活性状态方面的潜力。