CLARINET：从文献中高效学习动态网络模型

CLARINET: efficient learning of dynamic network models from literature.

作者信息

Ahmed Yasmine, Telmer Cheryl A, Miskov-Zivanov Natasa

机构信息

Electrical and Computer Engineering Department, University of Pittsburgh, Pittsburgh, PA 15213, USA.

Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

出版信息

Bioinform Adv. 2021 Jun 3;1(1):vbab006. doi: 10.1093/bioadv/vbab006. eCollection 2021.

DOI:10.1093/bioadv/vbab006

PMID:36700090

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9710628/

Abstract

MOTIVATION

Creating or extending computational models of complex systems, such as intra- and intercellular biological networks, is a time and labor-intensive task, often limited by the knowledge and experience of modelers. Automating this process would enable rapid, consistent, comprehensive and robust analysis and understanding of complex systems.

RESULTS

In this work, we present CLARINET (fying works), a novel methodology and a tool for automatically expanding models using the information extracted from the literature by machine reading. CLARINET creates collaboration graphs from the extracted events and uses several novel metrics for evaluating these events individually, in pairs, and in groups. These metrics are based on the frequency of occurrence and co-occurrence of events in literature, and their connectivity to the baseline model. We tested how well CLARINET can reproduce manually built and curated models, when provided with varying amount of information in the baseline model and in the machine reading output. Our results show that CLARINET can recover all relevant interactions that are present in the reading output and it automatically reconstructs manually built models with average recall of 80% and average precision of 70%. CLARINET is highly scalable, its average runtime is at the order of ten seconds when processing several thousand interactions, outperforming other similar methods.

AVAILABILITY AND IMPLEMENTATION

The data underlying this article are available in Bitbucket at https://bitbucket.org/biodesignlab/clarinet/src/master/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

创建或扩展复杂系统的计算模型，如细胞内和细胞间生物网络，是一项耗时费力的任务，通常受到建模者知识和经验的限制。自动化这一过程将能够对复杂系统进行快速、一致、全面且稳健的分析和理解。

结果

在这项工作中，我们提出了CLARINET（灵活工作），这是一种新颖的方法和工具，用于利用通过机器阅读从文献中提取的信息自动扩展模型。CLARINET从提取的事件创建协作图，并使用几种新颖的指标分别、成对和分组评估这些事件。这些指标基于文献中事件的出现频率和共现频率，以及它们与基线模型的连通性。当在基线模型和机器阅读输出中提供不同数量的信息时，我们测试了CLARINET能够多好地重现手动构建和策划的模型。我们的结果表明，CLARINET可以恢复阅读输出中存在的所有相关相互作用，并且它能够自动重建手动构建的模型，平均召回率为80%，平均精度为70%。CLARINET具有高度可扩展性，在处理数千个相互作用时，其平均运行时间约为十秒，优于其他类似方法。

可用性和实现

本文所依据的数据可在Bitbucket上获取，网址为https://bitbucket.org/biodesignlab/clarinet/src/master/。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a423/9710628/fe63c0a6902e/vbab006f1.jpg

相似文献

CLARINET: efficient learning of dynamic network models from literature.CLARINET：从文献中高效学习动态网络模型

Bioinform Adv. 2021 Jun 3;1(1):vbab006. doi: 10.1093/bioadv/vbab006. eCollection 2021.

LitPathExplorer: a confidence-based visual text analytics tool for exploring literature-enriched pathway models.LitPathExplorer：一种基于置信度的可视化文本分析工具，用于探索富含文献的通路模型。

Bioinformatics. 2018 Apr 15;34(8):1389-1397. doi: 10.1093/bioinformatics/btx774.

SKIMMR: facilitating knowledge discovery in life sciences by machine-aided skim reading.SKIMMR：通过机器辅助浏览促进生命科学领域的知识发现。

PeerJ. 2014 Jul 22;2:e483. doi: 10.7717/peerj.483. eCollection 2014.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

FLUTE: Fast and reliable knowledge retrieval from biomedical literature.FLUTE：从生物医学文献中快速可靠地检索知识。

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa056.

The value of prior knowledge in machine learning of complex network systems.先验知识在复杂网络系统机器学习中的价值。

Bioinformatics. 2017 Nov 15;33(22):3610-3618. doi: 10.1093/bioinformatics/btx438.

Advanced graph and sequence neural networks for molecular property prediction and drug discovery.用于分子性质预测和药物发现的高级图和序列神经网络。

Bioinformatics. 2022 Apr 28;38(9):2579-2586. doi: 10.1093/bioinformatics/btac112.

LexExp: a system for automatically expanding concept lexicons for noisy biomedical texts.LexExp：一个用于自动扩展含噪生物医学文本概念词典的系统。

Bioinformatics. 2021 Aug 25;37(16):2499-2501. doi: 10.1093/bioinformatics/btaa995.

Protein classification using modified n-grams and skip-grams.使用改进的 n 元语法和 skip-grams 进行蛋白质分类。

Bioinformatics. 2018 May 1;34(9):1481-1487. doi: 10.1093/bioinformatics/btx823.

Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.无监督构建具有显式结构归纳偏差的基因表达数据的计算图。

Bioinformatics. 2022 Feb 7;38(5):1320-1327. doi: 10.1093/bioinformatics/btab830.

引用本文的文献

Context-aware knowledge selection and reliable model recommendation with ACCORDION.使用ACCORDION进行上下文感知知识选择和可靠模型推荐。

Front Syst Biol. 2024 Apr 18;4:1308292. doi: 10.3389/fsysb.2024.1308292. eCollection 2024.

Automated model refinement using perturbation-observation pairs.使用摄动-观测对的自动模型优化

NPJ Syst Biol Appl. 2025 Jun 16;11(1):65. doi: 10.1038/s41540-025-00532-y.

Automated assembly of molecular mechanisms at scale from text mining and curated databases.从文本挖掘和经过整理的数据库中大规模自动组装分子机制。

Mol Syst Biol. 2023 May 9;19(5):e11325. doi: 10.15252/msb.202211325. Epub 2023 Mar 20.

本文引用的文献

Discrete dynamic network modeling of oncogenic signaling: Mechanistic insights for personalized treatment of cancer.致癌信号的离散动态网络建模：癌症个性化治疗的机制见解

Curr Opin Syst Biol. 2018 Jun;9:1-10. doi: 10.1016/j.coisb.2018.02.002.

FLUTE: Fast and reliable knowledge retrieval from biomedical literature.FLUTE：从生物医学文献中快速可靠地检索知识。

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa056.

Automated Extension of Cell Signaling Models with Genetic Algorithm.基于遗传算法的细胞信号传导模型自动扩展

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:5030-5033. doi: 10.1109/EMBC.2018.8513431.

From word models to executable models of signaling networks using automated assembly.使用自动化装配从单词模型到信号网络的可执行模型。

Mol Syst Biol. 2017 Nov 24;13(11):954. doi: 10.15252/msb.20177651.

HMDB 4.0: the human metabolome database for 2018.HMDB 4.0：2018 年人类代谢组数据库。

Nucleic Acids Res. 2018 Jan 4;46(D1):D608-D617. doi: 10.1093/nar/gkx1089.

UniProt: the universal protein knowledgebase.通用蛋白质知识库：UniProt

Nucleic Acids Res. 2017 Jan 4;45(D1):D158-D169. doi: 10.1093/nar/gkw1099. Epub 2016 Nov 29.

Automated detection of discourse segment and experimental types from the text of cancer pathway results sections.从癌症通路结果部分的文本中自动检测语篇片段和实验类型。

Database (Oxford). 2016 Aug 31;2016. doi: 10.1093/database/baw122. Print 2016.

Cutting Edge: Differential Regulation of PTEN by TCR, Akt, and FoxO1 Controls CD4+ T Cell Fate Decisions.前沿：TCR、Akt和FoxO1对PTEN的差异性调控决定CD4+ T细胞命运

J Immunol. 2015 May 15;194(10):4615-9. doi: 10.4049/jimmunol.1402554. Epub 2015 Apr 8.

The duration of T cell stimulation is a critical determinant of cell fate and plasticity.T细胞刺激的持续时间是细胞命运和可塑性的关键决定因素。

Sci Signal. 2013 Nov 5;6(300):ra97. doi: 10.1126/scisignal.2004217.

Dynamical and structural analysis of a T cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia.动态和结构分析 T 细胞存活网络确定大颗粒淋巴细胞白血病的新候选治疗靶点。

PLoS Comput Biol. 2011 Nov;7(11):e1002267. doi: 10.1371/journal.pcbi.1002267. Epub 2011 Nov 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CLARINET：从文献中高效学习动态网络模型

CLARINET: efficient learning of dynamic network models from literature.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献