Merck KGaA, Frankfurter Straße, Darmstadt, Germany.
IBM Watson Health, Almaden, California, United States of America.
PLoS One. 2019 Apr 8;14(4):e0214619. doi: 10.1371/journal.pone.0214619. eCollection 2019.
Pharmacodynamic biomarkers are becoming increasingly valuable for assessing drug activity and target modulation in clinical trials. However, identifying quality biomarkers is challenging due to the increasing volume and heterogeneity of relevant data describing the biological networks that underlie disease mechanisms. A biological pathway network typically includes entities (e.g. genes, proteins and chemicals/drugs) as well as the relationships between these and is typically curated or mined from structured databases and textual co-occurrence data. We propose a hybrid Natural Language Processing and directed relationships-based network analysis approach using IBM Watson for Drug Discovery to rank all human genes and identify potential candidate biomarkers, requiring only an initial determination of a specific target-disease relationship.
Through natural language processing of scientific literature, Watson for Drug Discovery creates a network of semantic relationships between biological concepts such as genes, drugs, and diseases. Using Bruton's tyrosine kinase as a case study, Watson for Drug Discovery's automatically extracted relationship network was compared with a prominent manually curated physical interaction network. Additionally, potential biomarkers for Bruton's tyrosine kinase inhibition were predicted using a matrix factorization approach and subsequently compared with expert-generated biomarkers.
Watson's natural language processing generated a relationship network matching 55 (86%) genes upstream of BTK and 98 (95%) genes downstream of Bruton's tyrosine kinase in a prominent manually curated physical interaction network. Matrix factorization analysis predicted 11 of 13 genes identified by Merck subject matter experts in the top 20% of Watson for Drug Discovery's 13,595 ranked genes, with 7 in the top 5%.
Taken together, these results suggest that Watson for Drug Discovery's automatic relationship network identifies the majority of upstream and downstream genes in biological pathway networks and can be used to help with the identification and prioritization of pharmacodynamic biomarker evaluation, accelerating the early phases of disease hypothesis generation.
药效学生物标志物在临床试验中越来越有价值,可以评估药物活性和靶标调节。然而,由于描述疾病机制相关生物网络的相关数据量和异质性不断增加,因此识别质量标志物具有挑战性。生物途径网络通常包括实体(例如基因、蛋白质和化学物质/药物)以及这些实体之间的关系,并且通常是从结构化数据库和文本共现数据中整理或挖掘出来的。我们提出了一种混合自然语言处理和基于有向关系的网络分析方法,使用 IBM Watson for Drug Discovery 对所有人类基因进行排名,并确定潜在的候选生物标志物,仅需要初始确定特定的靶-疾病关系。
通过对科学文献进行自然语言处理,Watson for Drug Discovery 在基因、药物和疾病等生物概念之间创建了语义关系网络。以布鲁顿酪氨酸激酶为例,Watson for Drug Discovery 自动提取的关系网络与一个著名的手动整理的物理相互作用网络进行了比较。此外,使用矩阵分解方法预测布鲁顿酪氨酸激酶抑制的潜在生物标志物,并与专家生成的生物标志物进行比较。
Watson 的自然语言处理生成的关系网络与一个著名的手动整理的物理相互作用网络相匹配,其中包含 55(86%)个布鲁顿酪氨酸激酶上游基因和 98(95%)个下游基因。矩阵分解分析预测了默克主题专家确定的 13 个基因中的 11 个基因(13595 个排名基因中的前 20%),其中 7 个基因在前 5%。
总的来说,这些结果表明,Watson for Drug Discovery 的自动关系网络可以识别生物途径网络中的大多数上游和下游基因,并可用于帮助识别和优先考虑药效学生物标志物评估,从而加速疾病假说生成的早期阶段。