Shionyu-Mitusyama Clara, Ohmori Satoshi, Hirata Subaru, Ishida Hirokazu, Shirai Tsuyoshi
Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama, Shiga, Japan.
Faculty of Data Science, Shiga University 1-1-1 Banba, Hikone, Shiga, Japan.
Front Bioinform. 2025 Jul 18;5:1627836. doi: 10.3389/fbinf.2025.1627836. eCollection 2025.
Intrinsically disordered regions (IDRs) of proteins have traditionally been overlooked as drug targets. However, with growing recognition of their crucial role in biological activity and their involvement in various diseases, IDRs have emerged as promising targets for drug discovery. Despite this potential, rational methodologies for IDR-targeted drug discovery remain underdeveloped, primarily due to a lack of reference experimental data.
This study explores a machine learning approach to predict IDR functions, drug interaction sites, and interacting molecular substructures within IDR sequences. To address the data gap, stepwise transfer learning was employed. IDRdecoder sequentially generate predictions for IDR classification, interaction sites, and interacting ligand substructures. In the first step, the neural net was trained as autoencoder by using 26,480,862 predicted IDR sequences. Then it was trained against 57,692 ligand-binding PDB sequences with higher IDR tendency via transfer learning for predict ligand interacting sites and ligand types.
IDRdecoder was evaluated against 9 IDR sequences, which were experimentally detailed as drug targets. In the encoding space, specific GO terms related to the hypothesized functions of the evaluation IDR sequences were highly enriched. The model's prediction performance for drug interacting sites and ligand types demonstrated the area under the curve (AUC) of 0.616 and 0.702, respectively. The performance was compared with existing methods including ProteinBERT, and IDRdecoder demonstrated moderately improved performance.
IDRdecoder is the first application for predicting drug interaction sites and ligands in IDR sequences. Analysis of the prediction results revealed characteristics beneficial for IDR-drug design; for instance, Tyr and Ala are preferred target sites, while flexible substructures, such as alkyl groups, are favored in ligand molecules.
蛋白质的内在无序区域(IDR)传统上一直被忽视作为药物靶点。然而,随着人们越来越认识到它们在生物活性中的关键作用以及它们与各种疾病的关联,IDR已成为药物发现的有希望的靶点。尽管有这种潜力,但针对IDR的药物发现的合理方法仍然不够发达,主要是由于缺乏参考实验数据。
本研究探索了一种机器学习方法来预测IDR功能、药物相互作用位点以及IDR序列内的相互作用分子亚结构。为了解决数据缺口,采用了逐步迁移学习。IDRdecoder依次对IDR分类、相互作用位点和相互作用配体亚结构进行预测。在第一步中,通过使用26,480,862个预测的IDR序列将神经网络训练为自动编码器。然后通过迁移学习针对具有更高IDR倾向的57,692个配体结合PDB序列对其进行训练,以预测配体相互作用位点和配体类型。
针对9个作为药物靶点进行了实验详细研究的IDR序列对IDRdecoder进行了评估。在编码空间中,与评估IDR序列的假设功能相关的特定基因本体(GO)术语高度富集。该模型对药物相互作用位点和配体类型的预测性能分别显示曲线下面积(AUC)为0.616和0.702。将该性能与包括ProteinBERT在内的现有方法进行了比较,IDRdecoder表现出适度的性能提升。
IDRdecoder是预测IDR序列中药物相互作用位点和配体的首个应用。对预测结果的分析揭示了对IDR药物设计有益的特征;例如,酪氨酸(Tyr)和丙氨酸(Ala)是优选的靶点位点,而配体分子中倾向于柔性亚结构,如烷基。