Health Data Science, Harvard University, Boston, MA 02120, USA.
Analytics Center of Excellence, IQVIA, Cambridge, MA 02139, USA.
Bioinformatics. 2021 May 5;37(6):830-836. doi: 10.1093/bioinformatics/btaa880.
Drug-target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data.
We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines.
The model scripts are available at https://github.com/kexinhuang12345/moltrans.
Supplementary data are available at Bioinformatics online.
药物-靶标相互作用(DTI)预测是计算药物发现的基础任务,由于需要在大型药物化合物空间中进行实验搜索,因此成本高且耗时。近年来,深度学习在 DTI 预测方面取得了有希望的进展。然而,以下挑战仍然存在:(i)现有的分子表示学习方法忽略了 DTI 的亚结构性质,因此产生的结果准确性较低且难以解释;(ii)现有的方法侧重于有限的标记数据,而忽略了大量未标记分子数据的价值。
我们提出了一种分子相互作用转换器(MolTrans),通过以下方式解决这些限制:(i)知识启发的亚结构模式挖掘算法和交互建模模块,用于更准确和可解释的 DTI 预测;(ii)增强的转换器编码器,用于更好地从大量未标记的生物医学数据中提取和捕获亚结构之间的语义关系。我们在真实数据上评估了 MolTrans,并表明它与最先进的基线相比提高了 DTI 预测性能。
模型脚本可在 https://github.com/kexinhuang12345/moltrans 上获得。
补充数据可在 Bioinformatics 在线获得。