• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于半自动编目工作流程从非结构化信息构建生物网络。

Construction of biological networks from unstructured information based on a semi-automated curation workflow.

作者信息

Szostak Justyna, Ansari Sam, Madan Sumit, Fluck Juliane, Talikka Marja, Iskandar Anita, De Leon Hector, Hofmann-Apitius Martin, Peitsch Manuel C, Hoeng Julia

机构信息

Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland and.

Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, 2000 Neuchâtel, Switzerland and

出版信息

Database (Oxford). 2015 Jun 17;2015:bav057. doi: 10.1093/database/bav057.

DOI:10.1093/database/bav057
PMID:26200752
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5630939/
Abstract

Capture and representation of scientific knowledge in a structured format are essential to improve the understanding of biological mechanisms involved in complex diseases. Biological knowledge and knowledge about standardized terminologies are difficult to capture from literature in a usable form. A semi-automated knowledge extraction workflow is presented that was developed to allow users to extract causal and correlative relationships from scientific literature and to transcribe them into the computable and human readable Biological Expression Language (BEL). The workflow combines state-of-the-art linguistic tools for recognition of various entities and extraction of knowledge from literature sources. Unlike most other approaches, the workflow outputs the results to a curation interface for manual curation and converts them into BEL documents that can be compiled to form biological networks. We developed a new semi-automated knowledge extraction workflow that was designed to capture and organize scientific knowledge and reduce the required curation skills and effort for this task. The workflow was used to build a network that represents the cellular and molecular mechanisms implicated in atherosclerotic plaque destabilization in an apolipoprotein-E-deficient (ApoE(-/-)) mouse model. The network was generated using knowledge extracted from the primary literature. The resultant atherosclerotic plaque destabilization network contains 304 nodes and 743 edges supported by 33 PubMed referenced articles. A comparison between the semi-automated and conventional curation processes showed similar results, but significantly reduced curation effort for the semi-automated process. Creating structured knowledge from unstructured text is an important step for the mechanistic interpretation and reusability of knowledge. Our new semi-automated knowledge extraction workflow reduced the curation skills and effort required to capture and organize scientific knowledge. The atherosclerotic plaque destabilization network that was generated is a causal network model for vascular disease demonstrating the usefulness of the workflow for knowledge extraction and construction of mechanistically meaningful biological networks.

摘要

以结构化格式捕获和呈现科学知识对于增进对复杂疾病所涉及生物机制的理解至关重要。生物知识以及关于标准化术语的知识难以以可用形式从文献中获取。本文提出了一种半自动知识提取工作流程,其开发目的是让用户能够从科学文献中提取因果关系和相关关系,并将其转录为可计算且人类可读的生物表达语言(BEL)。该工作流程结合了用于识别各种实体和从文献来源提取知识的先进语言工具。与大多数其他方法不同,该工作流程将结果输出到一个用于人工编目的界面,并将其转换为可编译以形成生物网络的BEL文档。我们开发了一种新的半自动知识提取工作流程,旨在捕获和组织科学知识,并减少此任务所需的编目技能和工作量。该工作流程用于构建一个网络,该网络代表载脂蛋白E缺陷(ApoE(-/-))小鼠模型中动脉粥样硬化斑块不稳定所涉及的细胞和分子机制。该网络是使用从原始文献中提取的知识生成的。所得的动脉粥样硬化斑块不稳定网络包含304个节点和743条边,由33篇PubMed引用文章支持。半自动编目过程与传统编目过程的比较显示了相似的结果,但半自动过程的编目工作量显著减少。从非结构化文本创建结构化知识是知识的机械解释和可重用性的重要一步。我们新的半自动知识提取工作流程减少了捕获和组织科学知识所需的编目技能和工作量。所生成的动脉粥样硬化斑块不稳定网络是一种血管疾病的因果网络模型,证明了该工作流程在知识提取和构建具有机械意义的生物网络方面的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/8b6b3e0fa7f0/bav057f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/55ac163e4013/bav057f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/383570425349/bav057f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/01ac0734774d/bav057f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/8b6b3e0fa7f0/bav057f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/55ac163e4013/bav057f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/383570425349/bav057f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/01ac0734774d/bav057f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c48e/5630939/8b6b3e0fa7f0/bav057f4p.jpg

相似文献

1
Construction of biological networks from unstructured information based on a semi-automated curation workflow.基于半自动编目工作流程从非结构化信息构建生物网络。
Database (Oxford). 2015 Jun 17;2015:bav057. doi: 10.1093/database/bav057.
2
Strategies towards digital and semi-automated curation in RegulonDB.RegulonDB中数字和半自动管理的策略。
Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax012.
3
Can we replace curation with information extraction software?我们能用信息提取软件取代编辑工作吗?
Database (Oxford). 2016 Dec 26;2016. doi: 10.1093/database/baw150. Print 2016.
4
Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL).用于提取生物表达语言(BEL)中编码的因果关系的训练和评估语料库。
Database (Oxford). 2016 Aug 23;2016. doi: 10.1093/database/baw113. Print 2016.
5
The My Cancer Genome clinical trial data model and trial curation workflow.《My Cancer Genome 临床试验数据模型与试验管理工作流程》
J Am Med Inform Assoc. 2020 Jul 1;27(7):1057-1066. doi: 10.1093/jamia/ocaa066.
6
The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track.BEL信息提取工作流程(BELIEF):在生物创意V BEL和IAT赛道中的评估
Database (Oxford). 2016 Oct 2;2016. doi: 10.1093/database/baw136. Print 2016.
7
Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase.文本挖掘与社区策展相结合:一个新设计的策展平台,旨在改善 WormBase 的作者体验和参与度。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa006.
8
Accelerating annotation of articles via automated approaches: evaluation of the neXtA5 curation-support tool by neXtProt.通过自动化方法加速文章注释:NextProt 对 neXtA5 内容管理支持工具的评估。
Database (Oxford). 2018 Jan 1;2018:bay129. doi: 10.1093/database/bay129.
9
Mining clinical attributes of genomic variants through assisted literature curation in Egas.通过在Egas中辅助文献编目挖掘基因组变异的临床属性。
Database (Oxford). 2016 Jun 7;2016. doi: 10.1093/database/baw096. Print 2016.
10
Semi-Automated Curation Allows Causal Network Model Building for the Quantification of Age-Dependent Plaque Progression in ApoE Mouse.半自动策展助力构建因果网络模型,以量化载脂蛋白E小鼠中年龄依赖性斑块进展情况。
Gene Regul Syst Bio. 2016 Nov 6;10:95-103. doi: 10.4137/GRSB.S40031. eCollection 2016.

引用本文的文献

1
A Text Mining Protocol for Mining Biological Pathways and Regulatory Networks from Biomedical Literature.一种从生物医学文献中挖掘生物途径和调控网络的文本挖掘协议。
Methods Mol Biol. 2022;2496:141-157. doi: 10.1007/978-1-0716-2305-3_8.
2
Causal biological network models for reactive astrogliosis: a systems approach to neuroinflammation.反应性星形胶质细胞形成的因果生物学网络模型:神经炎症的系统方法
Sci Rep. 2022 Mar 10;12(1):4205. doi: 10.1038/s41598-022-07651-0.
3
Signaling pathway perturbation analysis for assessment of biological impact of cigarette smoke on lung cells.

本文引用的文献

1
Lectin-like oxidized LDL receptor-1 is an enhancer of tumor angiogenesis in human prostate cancer cells.凝集素样氧化型低密度脂蛋白受体-1是人类前列腺癌细胞中肿瘤血管生成的增强因子。
PLoS One. 2014 Aug 29;9(8):e106219. doi: 10.1371/journal.pone.0106219. eCollection 2014.
2
Thrombospondin-1 repression is mediated via distinct mechanisms in fibroblasts and epithelial cells.血小板反应蛋白-1的抑制作用在成纤维细胞和上皮细胞中通过不同机制介导。
Oncogene. 2015 May 28;34(22):2823-35. doi: 10.1038/onc.2014.228. Epub 2014 Aug 11.
3
A vascular biology network model focused on inflammatory processes to investigate atherogenesis and plaque instability.
信号通路扰动分析评估香烟烟雾对肺细胞的生物学影响。
Sci Rep. 2021 Aug 18;11(1):16715. doi: 10.1038/s41598-021-95938-z.
4
The Potential of OMICs Technologies for the Treatment of Immune-Mediated Inflammatory Diseases.OMICS 技术在治疗免疫介导的炎症性疾病方面的潜力。
Int J Mol Sci. 2021 Jul 13;22(14):7506. doi: 10.3390/ijms22147506.
5
NPA: an R package for computing network perturbation amplitudes using gene expression data and two-layer networks.NPA:一个使用基因表达数据和两层网络计算网络干扰幅度的 R 包。
BMC Bioinformatics. 2019 Sep 3;20(1):451. doi: 10.1186/s12859-019-3016-x.
6
Re-curation and rational enrichment of knowledge graphs in Biological Expression Language.生物表达语言中知识图谱的再策展和合理丰富化。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz068.
7
Construction of a Suite of Computable Biological Network Models Focused on Mucociliary Clearance in the Respiratory Tract.构建一套专注于呼吸道黏液纤毛清除功能的可计算生物网络模型。
Front Genet. 2019 Feb 15;10:87. doi: 10.3389/fgene.2019.00087. eCollection 2019.
8
Signalling maps in cancer research: construction and data analysis.癌症研究中的信号转导图谱:构建与数据分析。
Database (Oxford). 2018 Jan 1;2018. doi: 10.1093/database/bay036.
9
Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.Textpresso 中心:一个可定制的平台,用于搜索、文本挖掘、查看和管理生物医学文献。
BMC Bioinformatics. 2018 Mar 9;19(1):94. doi: 10.1186/s12859-018-2103-8.
10
Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.用于捕获和标准化非结构化临床信息的自然语言处理系统:一项系统综述。
J Biomed Inform. 2017 Sep;73:14-29. doi: 10.1016/j.jbi.2017.07.012. Epub 2017 Jul 17.
一个专注于炎症过程以研究动脉粥样硬化发生和斑块不稳定性的血管生物学网络模型。
J Transl Med. 2014 Jun 26;12:185. doi: 10.1186/1479-5876-12-185.
4
Soluble CD40 ligand is associated with angiographic severity of coronary artery disease in patients with acute coronary syndrome.可溶性CD40配体与急性冠状动脉综合征患者冠状动脉疾病的血管造影严重程度相关。
Chin Med J (Engl). 2014;127(12):2218-21.
5
Recent advances in modeling languages for pathway maps and computable biological networks.通路图和可计算生物网络建模语言的最新进展。
Drug Discov Today. 2014 Feb;19(2):193-8. doi: 10.1016/j.drudis.2013.12.011. Epub 2014 Jan 17.
6
Reverse causal reasoning: applying qualitative causal knowledge to the interpretation of high-throughput data.反向因果推理:将定性因果知识应用于高通量数据的解释。
BMC Bioinformatics. 2013 Nov 23;14:340. doi: 10.1186/1471-2105-14-340.
7
CD40L contributes to angiotensin II-induced pro-thrombotic state, vascular inflammation, oxidative stress and endothelial dysfunction.CD40L 有助于血管紧张素 II 诱导的促血栓形成状态、血管炎症、氧化应激和内皮功能障碍。
Basic Res Cardiol. 2013 Nov;108(6):386. doi: 10.1007/s00395-013-0386-5. Epub 2013 Sep 24.
8
Quantitative assessment of biological impact using transcriptomic data and mechanistic network models.基于转录组学数据和机制网络模型的生物学影响的定量评估。
Toxicol Appl Pharmacol. 2013 Nov 1;272(3):863-78. doi: 10.1016/j.taap.2013.07.007. Epub 2013 Aug 8.
9
The CD40/CD40L system: a new therapeutic target for disease.CD40/CD40L 系统:疾病的新治疗靶点。
Immunol Lett. 2013 Jun;153(1-2):58-61. doi: 10.1016/j.imlet.2013.07.005. Epub 2013 Jul 25.
10
Systematic curation of protein and genetic interaction data for computable biology.为可计算生物学系统整理蛋白质和基因相互作用数据。
BMC Biol. 2013 Apr 15;11:43. doi: 10.1186/1741-7007-11-43.