Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), DCEXS, Pompeu Fabra University (UPF), Barcelona, Spain.
Department of Toxicogenomics, Maastricht University, Maastricht, The Netherlands.
Biol Direct. 2021 Jan 12;16(1):5. doi: 10.1186/s13062-020-00288-x.
Drug-induced liver injury (DILI) is an adverse reaction caused by the intake of drugs of common use that produces liver damage. The impact of DILI is estimated to affect around 20 in 100,000 inhabitants worldwide each year. Despite being one of the main causes of liver failure, the pathophysiology and mechanisms of DILI are poorly understood. In the present study, we developed an ensemble learning approach based on different features (CMap gene expression, chemical structures, drug targets) to predict drugs that might cause DILI and gain a better understanding of the mechanisms linked to the adverse reaction.
We searched for gene signatures in CMap gene expression data by using two approaches: phenotype-gene associations data from DisGeNET, and a non-parametric test comparing gene expression of DILI-Concern and No-DILI-Concern drugs (as per DILIrank definitions). The average accuracy of the classifiers in both approaches was 69%. We used chemical structures as features, obtaining an accuracy of 65%. The combination of both types of features produced an accuracy around 63%, but improved the independent hold-out test up to 67%. The use of drug-target associations as feature obtained the best accuracy (70%) in the independent hold-out test.
When using CMap gene expression data, searching for a specific gene signature among the landmark genes improves the quality of the classifiers, but it is still limited by the intrinsic noise of the dataset. When using chemical structures as a feature, the structural diversity of the known DILI-causing drugs hampers the prediction, which is a similar problem as for the use of gene expression information. The combination of both features did not improve the quality of the classifiers but increased the robustness as shown on independent hold-out tests. The use of drug-target associations as feature improved the prediction, specially the specificity, and the results were comparable to previous research studies.
药物性肝损伤(DILI)是由常用药物摄入引起的不良反应,导致肝脏损伤。据估计,DILI 的影响每年在全球每 10 万人中约有 20 人受到影响。尽管 DILI 是肝衰竭的主要原因之一,但 DILI 的病理生理学和机制仍知之甚少。在本研究中,我们开发了一种基于不同特征(CMap 基因表达、化学结构、药物靶点)的集成学习方法,以预测可能导致 DILI 的药物,并更好地了解与不良反应相关的机制。
我们通过两种方法在 CMap 基因表达数据中搜索基因特征:DisGeNET 的表型-基因关联数据,以及非参数检验比较 DILI-Concern 和 No-DILI-Concern 药物的基因表达(根据 DILIrank 定义)。这两种方法的分类器平均准确率均为 69%。我们使用化学结构作为特征,获得了 65%的准确率。将这两种类型的特征结合起来,准确率约为 63%,但在独立验证测试中提高到了 67%。使用药物-靶点关联作为特征在独立验证测试中获得了最佳准确率(70%)。
使用 CMap 基因表达数据时,在标志性基因中搜索特定的基因特征可以提高分类器的质量,但仍受到数据集固有噪声的限制。使用化学结构作为特征时,已知导致 DILI 的药物的结构多样性会阻碍预测,这与使用基因表达信息的问题类似。将这两种特征结合起来并没有提高分类器的质量,但增加了稳健性,这在独立验证测试中得到了证明。使用药物-靶点关联作为特征可以提高预测的特异性,特别是特异性,并且结果与以前的研究相似。