School of Informatics and Computing, Indiana University, Bloomington, IN, USA.
PLoS Comput Biol. 2012;8(7):e1002574. doi: 10.1371/journal.pcbi.1002574. Epub 2012 Jul 5.
The rapidly increasing amount of public data in chemistry and biology provides new opportunities for large-scale data mining for drug discovery. Systematic integration of these heterogeneous sets and provision of algorithms to data mine the integrated sets would permit investigation of complex mechanisms of action of drugs. In this work we integrated and annotated data from public datasets relating to drugs, chemical compounds, protein targets, diseases, side effects and pathways, building a semantic linked network consisting of over 290,000 nodes and 720,000 edges. We developed a statistical model to assess the association of drug target pairs based on their relation with other linked objects. Validation experiments demonstrate the model can correctly identify known direct drug target pairs with high precision. Indirect drug target pairs (for example drugs which change gene expression level) are also identified but not as strongly as direct pairs. We further calculated the association scores for 157 drugs from 10 disease areas against 1683 human targets, and measured their similarity using a [Formula: see text] score matrix. The similarity network indicates that drugs from the same disease area tend to cluster together in ways that are not captured by structural similarity, with several potential new drug pairings being identified. This work thus provides a novel, validated alternative to existing drug target prediction algorithms. The web service is freely available at: http://chem2bio2rdf.org/slap.
化学和生物学领域中不断增长的公共数据量为药物发现提供了大规模数据挖掘的新机会。系统地整合这些异构数据集,并提供算法来挖掘整合数据集,将允许研究药物复杂的作用机制。在这项工作中,我们整合并注释了与药物、化学化合物、蛋白质靶标、疾病、副作用和途径相关的公共数据集的数据,构建了一个包含超过 29 万个节点和 72 万个边的语义链接网络。我们开发了一个统计模型,基于它们与其他链接对象的关系来评估药物靶标对的关联。验证实验表明,该模型可以正确识别具有高精度的已知直接药物靶标对。还识别了间接药物靶标对(例如,改变基因表达水平的药物),但不如直接靶标对那么强烈。我们进一步计算了来自 10 个疾病领域的 157 种药物与 1683 个人类靶标之间的关联分数,并使用[Formula: see text]分数矩阵来衡量它们的相似性。相似性网络表明,来自同一疾病领域的药物往往以结构相似性无法捕捉到的方式聚集在一起,同时确定了几种潜在的新药物组合。因此,这项工作为现有的药物靶标预测算法提供了一种新颖的、经过验证的替代方法。该网络服务可免费使用:http://chem2bio2rdf.org/slap。