Namba Satoko, Li Chen, Yuyama Otani Noriko, Yamanishi Yoshihiro
Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, Aichi, 464-8601, Japan.
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf039.
Identifying effective therapeutic targets poses a challenge in drug discovery, especially for uncharacterized diseases without known therapeutic targets (e.g. rare diseases, intractable diseases).
This study presents a novel machine learning approach using multimodal vector-quantized variational autoencoders (VQ-VAEs) for predicting therapeutic target molecules across diseases. To address the lack of known therapeutic target-disease associations, we incorporate the information on uncharacterized diseases without known targets or uncharacterized proteins without known indications (applicable diseases) in the semi-supervised learning (SSL) framework. The method integrates disease-specific and protein perturbation profiles with genetic perturbations (e.g. gene knockdowns and gene overexpressions) at the transcriptome level. Cross-cell representation learning, facilitated by VQ-VAEs, was performed to extract informative features from protein perturbation profiles across diverse human cell types. Concurrently, cross-disease representation learning was performed, leveraging VQ-VAE, to extract informative features reflecting disease states from disease-specific profiles. The model's applicability to uncharacterized diseases or proteins is enhanced by considering the consistency between disease-specific and patient-specific signatures. The efficacy of the method is demonstrated across three practical scenarios for 79 diseases: target repositioning for target-disease pairs, new target prediction for uncharacterized diseases, and new indication prediction for uncharacterized proteins. This method is expected to be valuable for identifying therapeutic targets across various diseases.
Code: github.com/YamanishiLab/SSL-VQ and Data: 10.5281/zenodo.14644837.
确定有效的治疗靶点是药物研发中的一项挑战,尤其是对于没有已知治疗靶点的未明确疾病(如罕见病、难治性疾病)。
本研究提出了一种新颖的机器学习方法,使用多模态向量量化变分自编码器(VQ-VAE)来预测跨疾病的治疗靶点分子。为了解决已知治疗靶点与疾病关联信息的缺乏问题,我们在半监督学习(SSL)框架中纳入了关于没有已知靶点的未明确疾病或没有已知适应症(适用疾病)的未明确蛋白质的信息。该方法在转录组水平上将疾病特异性和蛋白质扰动谱与基因扰动(如基因敲低和基因过表达)整合在一起。由VQ-VAE推动的跨细胞表示学习,用于从不同人类细胞类型的蛋白质扰动谱中提取信息特征。同时,利用VQ-VAE进行跨疾病表示学习,从疾病特异性谱中提取反映疾病状态的信息特征。通过考虑疾病特异性和患者特异性特征之间的一致性,增强了该模型对未明确疾病或蛋白质的适用性。该方法的有效性在针对79种疾病的三种实际场景中得到了证明:靶点-疾病对中的靶点重新定位、未明确疾病的新靶点预测以及未明确蛋白质的新适应症预测。该方法有望在识别各种疾病的治疗靶点方面具有价值。
代码:github.com/YamanishiLab/SSL-VQ ,数据:10.5281/zenodo.14644837 。