BMC Bioinformatics. 2013 Jun 26;14:207. doi: 10.1186/1471-2105-14-207.
Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence.
In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site.
Side effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs.
药物副作用是临床试验中药物研发中止的常见原因。为了降低药物开发过程中的淘汰率,以及发现现有药物中潜在的新副作用的风险,我们有必要提高理解药物副作用的能力。如今,大多数研究都集中在孤立的副作用上,而忽略了它们之间的冗余性及其频繁共现的情况。
在这项工作中,从 SIDER 和 DrugBank 数据库中收集药物注释。使用语义相似性度量将 SIDER 中报告的单个副作用描述的术语聚类成术语簇(TCs)。从由此产生的药物 x TC 二值表中提取最大频繁项集,从而识别我们所谓的副作用谱(SEP)。SEP 定义为在大量药物中共享的 TC 的最长组合。基于集成药物和靶标描述符,使用两种机器学习方法(决策树和归纳逻辑编程)探索频繁的 SEP。虽然这两种方法都产生了明确的模型,但归纳逻辑编程方法可以进行关系学习,不仅可以利用药物特性,还可以利用背景知识。通过交叉验证和直接测试新分子来评估学习效率。两种机器学习方法的比较表明,归纳逻辑编程方法的灵敏度高于决策树,并且能够成功利用背景知识,如药物靶标功能注释和途径,从而产生丰富而有表现力的规则。所有模型和理论都可以在专门的网站上获得。
从药物 x 副作用关联表中提取了涵盖大量药物的副作用谱。将涉及化学和生物空间的背景知识的整合与关系学习方法相结合,用于发现明确描述药物-SEP 关联的规则。这些规则成功地用于预测与新药相关的 SEP。