MicroDiscovery GmbH, Marienburger Straße 1, 10405, Berlin, Germany.
Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63, 14195, Berlin, Germany.
J Transl Med. 2021 Jun 26;19(1):274. doi: 10.1186/s12967-021-02941-z.
There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually.
In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data.
We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: https://knowledgebase.microdiscovery.de/heatmap .
Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.
有大量的科学文献描述了肿瘤类型与抗癌药物之间的关系。大量的科学文献使得研究人员和医生无法手动提取所有相关信息。
为了应对大量的文献,我们应用了一种自动化的文本挖掘方法来评估 30 种最常见的癌症类型和 270 种抗癌药物之间的关系。我们应用了两种不同的方法,一种是基于命名实体识别的经典文本挖掘方法,另一种是基于词向量的人工智能方法。文献挖掘结果的一致性通过 3 种独立的方法进行了验证:首先,使用来自 FDA 批准的数据,其次,使用实验测量的 IC50 细胞系数据,第三,使用临床患者生存数据。
我们证明了自动化文本挖掘能够成功地评估癌症类型与抗癌药物之间的关系。所有验证方法都表明,文献挖掘结果与独立的确认方法之间存在很好的一致性。最常见的癌症类型和用于治疗这些癌症的药物之间的关系在一个大型热图中可视化。所有结果都可以在一个交互式的基于网络的知识库中使用以下链接访问:https://knowledgebase.microdiscovery.de/heatmap。
我们的方法能够以自动化的方式评估化合物与癌症类型之间的关系。癌症类型和化合物都可以分为不同的簇。研究人员可以使用交互式知识库来检查所呈现的结果,并根据自己的研究问题进行跟踪,例如识别已知药物的新适应症领域。