Department of Computer Science and Engineering, University of Calcutta, JD-2, Sector-III, Salt Lake, Kolkata, 700098, India.
Department of Biological Sciences, Bose Institute, EN 80, Sector V, Bidhan Nagar, Kolkata, 700091, India.
Comput Biol Med. 2024 May;174:108413. doi: 10.1016/j.compbiomed.2024.108413. Epub 2024 Apr 5.
Lifestyle-related diseases (LSDs) impose a substantial economic burden on patients and health care services. LSDs are chronic in nature and can directly affect the heart and lungs. Therapeutic interventions only based on symptoms can be crucial for prompt treatment initiation in LSDs, as symptoms are the first information available to clinicians. So, this work aims to apply unsupervised machine learning (ML) techniques for developing models to predict drugs from symptoms for LSDs, with a specific focus on pulmonary and heart diseases.
The drug-disease and disease-symptom associations of 143 LSDs, 1271 drugs, and 305 symptoms were used to compute direct associations between drugs and symptoms. ML models with four different algorithms - K-Means, Bisecting K-Means, Mean Shift, and Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) - were developed to cluster the drugs using symptoms as features. The optimal model was saved in a server for the development of a web application. A web application was developed to perform the prediction based on the optimal model.
The Bisecting K-means model showed the best performance with a silhouette coefficient of 0.647 and generated 138 drug clusters. The drugs within the optimal clusters showed good similarity based on i) gene ontology annotations of the gene targets, ii) chemical ontology annotations, and iii) maximum common substructure of the drugs. In the web application, the model also provides a confidence score for each predicted drug while predicting from a new set of input symptoms.
In summary, direct associations between drugs and symptoms were computed, and those were used to develop a symptom-based drug prediction tool for LSDs with unsupervised ML models. The ML-based prediction can provide a second opinion to clinicians to aid their decision-making for early treatment of LSD patients. The web application (URL - http://bicresources.jcbose.ac.in/ssaha4/sdldpred) can provide a simple interface for all end-users to perform the ML-based prediction.
生活方式相关疾病(LSD)给患者和医疗保健服务带来了巨大的经济负担。LSD 是慢性疾病,会直接影响心肺。治疗干预仅基于症状对于 LSD 患者的及时治疗启动至关重要,因为症状是临床医生获得的第一手信息。因此,本研究旨在应用无监督机器学习(ML)技术开发 LSD 症状预测药物的模型,特别关注肺部和心脏疾病。
使用 143 种 LSD、1271 种药物和 305 种症状的药物-疾病和疾病-症状关联来计算药物与症状之间的直接关联。使用 K-Means、Bisecting K-Means、Mean Shift 和 Balanced Iterative Reducing and Clustering using Hierarchies(BIRCH)四种不同算法的 ML 模型,使用症状作为特征对药物进行聚类。将最优模型保存到服务器中,以开发 Web 应用程序。开发了一个 Web 应用程序,根据最优模型进行预测。
Bisecting K-means 模型表现最佳,轮廓系数为 0.647,生成了 138 个药物簇。最优簇内的药物在以下方面表现出良好的相似性:i)基因靶点的基因本体注释,ii)化学本体注释,和 iii)药物的最大公共子结构。在 Web 应用程序中,模型还在根据新输入症状进行预测时,为每个预测药物提供置信度得分。
总之,计算了药物与症状之间的直接关联,并使用无监督 ML 模型为 LSD 开发了基于症状的药物预测工具。基于 ML 的预测可以为临床医生提供第二个意见,帮助他们为 LSD 患者的早期治疗做出决策。Web 应用程序(URL - http://bicresources.jcbose.ac.in/ssaha4/sdldpred)可为所有最终用户提供一个简单的界面,以执行基于 ML 的预测。