Spiro Adam, Fernández García Jonatan, Yanover Chen
Machine Learning for Healthcare and Life Sciences, Department of Health Informatics, IBM Research, Haifa, Israel.
JAMIA Open. 2019 Jul 1;2(3):378-385. doi: 10.1093/jamiaopen/ooz022. eCollection 2019 Oct.
Identifying new relations between medical entities, such as drugs, diseases, and side effects, is typically a resource-intensive task, involving experimentation and clinical trials. The increased availability of related data and curated knowledge enables a computational approach to this task, notably by training models to predict likely relations. Such models rely on meaningful representations of the medical entities being studied. We propose a generic features vector representation that leverages co-occurrences of medical terms, linked with PubMed citations.
We demonstrate the usefulness of the proposed representation by inferring two types of relations: a drug causes a side effect and a drug treats an indication. To predict these relations and assess their effectiveness, we applied 2 modeling approaches: multi-task modeling using neural networks and single-task modeling based on gradient boosting machines and logistic regression.
These trained models, which predict either side effects or indications, obtained significantly better results than baseline models that use a single direct co-occurrence feature. The results demonstrate the advantage of a comprehensive representation.
Selecting the appropriate representation has an immense impact on the predictive performance of machine learning models. Our proposed representation is powerful, as it spans multiple medical domains and can be used to predict a wide range of relation types.
The discovery of new relations between various medical entities can be translated into meaningful insights, for example, related to drug development or disease understanding. Our representation of medical entities can be used to train models that predict such relations, thus accelerating healthcare-related discoveries.
识别医学实体(如药物、疾病和副作用)之间的新关系通常是一项资源密集型任务,涉及实验和临床试验。相关数据和精心整理的知识的可用性增加,使得能够采用计算方法来完成这项任务,特别是通过训练模型来预测可能的关系。此类模型依赖于所研究医学实体的有意义表示。我们提出了一种通用特征向量表示,它利用医学术语的共现,并与PubMed引文相关联。
我们通过推断两种类型的关系来证明所提出表示的有用性:药物导致副作用和药物治疗适应症。为了预测这些关系并评估其有效性,我们应用了两种建模方法:使用神经网络的多任务建模和基于梯度提升机及逻辑回归的单任务建模。
这些经过训练的模型,用于预测副作用或适应症,其结果明显优于使用单一直接共现特征的基线模型。结果证明了综合表示的优势。
选择合适的表示对机器学习模型的预测性能有巨大影响。我们提出的表示很强大,因为它跨越多个医学领域,可用于预测广泛的关系类型。
各种医学实体之间新关系的发现可以转化为有意义的见解,例如与药物开发或疾病理解相关的见解。我们对医学实体的表示可用于训练预测此类关系的模型,从而加速与医疗保健相关的发现。