Wichka Ibrahim, Lai Pin-Kuang
Department of Chemical Engineering and Materials Science, Stevens Institute of Technology, Hoboken, NJ 07030, USA.
Comput Struct Biotechnol J. 2024 Oct 16;23:3669-3679. doi: 10.1016/j.csbj.2024.10.019. eCollection 2024 Dec.
Celiac disease poses a significant health challenge for individuals consuming gluten-containing foods. While the availability of gluten-free products has increased, there is still a need for therapeutic treatments. The advancement of computational drug design, particularly using bio-cheminformatics-oriented machine learning, offers promising avenues for developing such therapies. One promising target is Transglutaminase 2 (TG2), a protein involved in the autoimmune response triggered by gluten consumption. In this study, we utilized data from approximately 1100 TG2 inhibition assays to develop ligand-based molecular screening techniques using ensemble machine-learning models and extensive molecular feature libraries. Various classifiers, including tree-based methods, artificial neural networks, and graph neural networks, were evaluated to identify primary systems for predictive analysis and feature significance assessment. Boosting ensembles of perceptron deep learning and low-depth random forest weak learners emerged as the most effective, achieving over 90 % accuracy, significantly outperforming a baseline of 64 %. Key features, such as the presence of a terminal Michael acceptor group and a sulfonamide group, were identified as important for activity. Additionally, a regression model was created to rank active compounds. We developed a web application, Celiac Informatics (https://celiac-informatics-v1-2b0a85e75868.herokuapp.com), to facilitate the screening of potential therapeutic molecules for celiac disease. The web app also provides drug-likeness reports, supporting the development of novel drugs.
乳糜泻对食用含麸质食物的个体构成了重大的健康挑战。尽管无麸质产品的供应有所增加,但仍需要治疗方法。计算药物设计的进展,特别是使用面向生物化学信息学的机器学习,为开发此类疗法提供了有希望的途径。一个有希望的靶点是转谷氨酰胺酶2(TG2),一种参与因食用麸质引发的自身免疫反应的蛋白质。在本研究中,我们利用了约1100次TG2抑制试验的数据,使用集成机器学习模型和广泛的分子特征库开发基于配体的分子筛选技术。评估了各种分类器,包括基于树的方法、人工神经网络和图神经网络,以确定用于预测分析和特征重要性评估的主要系统。感知器深度学习和低深度随机森林弱学习器的增强集成表现最为有效,准确率超过90%,显著优于64%的基线。关键特征,如末端迈克尔受体基团和磺酰胺基团的存在,被确定为对活性很重要。此外,还创建了一个回归模型来对活性化合物进行排名。我们开发了一个网络应用程序,乳糜泻信息学(https://celiac-informatics-v1-2b0a85e75868.herokuapp.com),以促进对乳糜泻潜在治疗分子的筛选。该网络应用程序还提供类药性质报告,支持新型药物的开发。