Laboratory of in vitro modeling systems of pulmonary and thrombotic diseases, Institute of Physiology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
Structural Bioinformatics Group, Institute of Physiology, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
PLoS Comput Biol. 2024 Sep 12;20(9):e1012417. doi: 10.1371/journal.pcbi.1012417. eCollection 2024 Sep.
In the modern era, the growth of scientific literature presents a daunting challenge for researchers to keep informed of advancements across multiple disciplines.
We apply natural language processing (NLP) and embedding learning concepts to design PubDigest, a tool that combs PubMed literature, aiming to pinpoint potential drugs that could be repurposed.
Using NLP, especially term associations through word embeddings, we explored unrecognized relationships between drugs and diseases. To illustrate the utility of PubDigest, we focused on chronic thromboembolic pulmonary hypertension (CTEPH), a rare disease with an overall limited number of scientific publications.
Our literature analysis identified key clinical features linked to CTEPH by applying term frequency-inverse document frequency (TF-IDF) scoring, a technique measuring a term's significance in a text corpus. This allowed us to map related diseases. One standout was venous thrombosis (VT), which showed strong semantic links with CTEPH. Looking deeper, we discovered potential repurposing candidates for CTEPH through large-scale neural network-based contextualization of literature and predictive modeling on both the CTEPH and the VT literature corpora to find novel, yet unrecognized associations between the two diseases. Alongside the anti-thrombotic agent caplacizumab, benzofuran derivatives were an intriguing find. In particular, the benzofuran derivative amiodarone displayed potential anti-thrombotic properties in the literature. Our in vitro tests confirmed amiodarone's ability to reduce platelet aggregation significantly by 68% (p = 0.02). However, real-world clinical data indicated that CTEPH patients receiving amiodarone treatment faced a significant 15.9% higher mortality risk (p<0.001).
While NLP offers an innovative approach to interpreting scientific literature, especially for drug repurposing, it is crucial to combine it with complementary methods like in vitro testing and real-world evidence. Our exploration with benzofuran derivatives and CTEPH underscores this point. Thus, blending NLP with hands-on experiments and real-world clinical data can pave the way for faster and safer drug repurposing approaches, especially for rare diseases like CTEPH.
在现代,科学文献的增长给研究人员带来了一项艰巨的挑战,他们需要及时了解多个学科的最新进展。
我们应用自然语言处理(NLP)和嵌入学习概念来设计 PubDigest,这是一种梳理 PubMed 文献的工具,旨在发现可能被重新利用的潜在药物。
我们使用 NLP,特别是通过词嵌入来探索术语关联,研究了药物和疾病之间未被识别的关系。为了说明 PubDigest 的实用性,我们专注于慢性血栓栓塞性肺动脉高压(CTEPH)这一罕见疾病,其科学文献总体数量有限。
通过应用术语频率-逆文档频率(TF-IDF)评分的方法,我们对文献进行了分析,识别出与 CTEPH 相关的关键临床特征,这是一种衡量术语在文本语料库中重要性的技术。这使我们能够映射相关疾病。其中一个突出的疾病是静脉血栓形成(VT),它与 CTEPH 具有很强的语义联系。进一步深入研究,我们通过对文献进行大规模神经网络上下文化处理,并对 CTEPH 和 VT 文献语料库进行预测建模,发现了这两种疾病之间新的、尚未被认识到的关联,找到了 CTEPH 的潜在重新利用候选药物。除了抗血栓药物 caplacizumab 外,苯并呋喃衍生物也是一个有趣的发现。特别是,苯并呋喃衍生物胺碘酮在文献中显示出潜在的抗血栓特性。我们的体外试验证实,胺碘酮可使血小板聚集显著减少 68%(p = 0.02)。然而,真实世界的临床数据表明,接受胺碘酮治疗的 CTEPH 患者的死亡率风险显著增加了 15.9%(p<0.001)。
虽然 NLP 为解释科学文献提供了一种创新方法,特别是在药物重新利用方面,但将其与体外测试和真实世界证据等补充方法结合使用至关重要。我们对苯并呋喃衍生物和 CTEPH 的探索突出了这一点。因此,将 NLP 与实际实验和真实世界的临床数据相结合,可以为更快、更安全的药物重新利用方法铺平道路,特别是对于 CTEPH 等罕见疾病。