• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用专家系统规划生物信息学工作流程。

Planning bioinformatics workflows using an expert system.

作者信息

Chen Xiaoling, Chang Jeffrey T

机构信息

School of Biomedical Informatics.

Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

出版信息

Bioinformatics. 2017 Apr 15;33(8):1210-1215. doi: 10.1093/bioinformatics/btw817.

DOI:10.1093/bioinformatics/btw817
PMID:28052928
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5860174/
Abstract

MOTIVATION

Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used.

RESULTS

To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise.

AVAILABILITY AND IMPLEMENTATION

https://github.com/jefftc/changlab.

CONTACT

jeffrey.t.chang@uth.tmc.edu.

摘要

动机

由于处理数据所需的步骤越来越多,以及每个步骤中可使用的方法激增,生物信息学分析正变得极其复杂。为了缓解这一困难,通常会采用工作流程。然而,工作流程通常是为了自动化特定分析而实施的,因此难以用于需要对所使用的软件或参数进行系统更改的探索性分析。

结果

为了实现工作流程开发的自动化,我们研究了专家系统。我们创建了生物信息学专家系统(BETSY),它包括一个知识库,其中生物信息学软件的功能被明确且正式地编码。BETSY是一个基于反向链规则的专家系统,由一个能够捕捉生物数据丰富性的数据模型和一个在知识库上进行推理以生成工作流程的推理引擎组成。目前,知识库中填充了用于分析微阵列和下一代测序数据的规则。我们对BETSY进行了评估,发现它可以生成能够重现并超越先前发表的生物信息学结果的工作流程。最后,对从知识库生成的工作流程进行的元调查产生了生物信息学分析每个步骤所带来的技术负担的定量度量,揭示了大量用于数据预处理的步骤。总之,专家系统方法可以通过自动化工作流程的开发来促进探索性生物信息学分析,而这一任务需要大量的领域专业知识。

可用性和实现方式

https://github.com/jefftc/changlab。

联系方式

jeffrey.t.chang@uth.tmc.edu。

相似文献

1
Planning bioinformatics workflows using an expert system.使用专家系统规划生物信息学工作流程。
Bioinformatics. 2017 Apr 15;33(8):1210-1215. doi: 10.1093/bioinformatics/btw817.
2
Workflows for microarray data processing in the Kepler environment.在 Kepler 环境中进行微阵列数据处理的工作流程。
BMC Bioinformatics. 2012 May 17;13:102. doi: 10.1186/1471-2105-13-102.
3
Experiences with workflows for automating data-intensive bioinformatics.自动化数据密集型生物信息学工作流程的经验。
Biol Direct. 2015 Aug 19;10:43. doi: 10.1186/s13062-015-0071-8.
4
Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.生物信息学中的工作流程:工作流程生成器的元分析与原型实现
BMC Bioinformatics. 2005 Apr 7;6:87. doi: 10.1186/1471-2105-6-87.
5
SciApps: a cloud-based platform for reproducible bioinformatics workflows.SciApps:一个基于云的可重复生物信息学工作流平台。
Bioinformatics. 2018 Nov 15;34(22):3917-3920. doi: 10.1093/bioinformatics/bty439.
6
Watchdog - a workflow management system for the distributed analysis of large-scale experimental data.Watchdog - 一种用于大规模实验数据分析的分布式工作流管理系统。
BMC Bioinformatics. 2018 Mar 13;19(1):97. doi: 10.1186/s12859-018-2107-4.
7
Conveyor: a workflow engine for bioinformatic analyses.输送器:生物信息学分析的工作流引擎。
Bioinformatics. 2011 Apr 1;27(7):903-11. doi: 10.1093/bioinformatics/btr040. Epub 2011 Jan 28.
8
Biowep: a workflow enactment portal for bioinformatics applications.生物工作流引擎(Biowep):一个用于生物信息学应用的工作流制定门户。
BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-8-S1-S19.
9
KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis.KNIME4NGS:下一代测序分析的综合工具包。
Bioinformatics. 2017 May 15;33(10):1565-1567. doi: 10.1093/bioinformatics/btx003.
10
Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis.用于全面单组ChIP-seq分析的可重复使用、可扩展且可修改的R脚本和开普勒工作流程。
BMC Bioinformatics. 2016 Jul 5;17(1):270. doi: 10.1186/s12859-016-1125-3.

引用本文的文献

1
Targeting neddylation and sumoylation in chemoresistant triple negative breast cancer.靶向化疗耐药性三阴性乳腺癌中的NEDDylation和SUMOylation修饰
NPJ Breast Cancer. 2024 May 27;10(1):37. doi: 10.1038/s41523-024-00644-4.
2
The dual role of the DREAM/G2M pathway in non-tumorigenic immortalization of senescent cells.DREAM/G2M通路在衰老细胞非致瘤性永生化中的双重作用。
FEBS Open Bio. 2024 Feb;14(2):331-343. doi: 10.1002/2211-5463.13748. Epub 2023 Dec 21.
3
Multiomic analysis of homologous recombination-deficient end-stage high-grade serous ovarian cancer.同源重组缺陷型晚期高级别浆液性卵巢癌的多组学分析
Nat Genet. 2023 Mar;55(3):437-450. doi: 10.1038/s41588-023-01320-2. Epub 2023 Feb 27.
4
Predictors of success in establishing orthotopic patient-derived xenograft models of triple negative breast cancer.三阴性乳腺癌原位患者来源异种移植模型建立成功的预测因素。
NPJ Breast Cancer. 2023 Jan 10;9(1):2. doi: 10.1038/s41523-022-00502-1.
5
A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types.一种用于单细胞RNA测序(scRNA-seq)UMI阈值优化和细胞类型准确分类的机器学习框架。
Front Genet. 2022 Nov 25;13:982019. doi: 10.3389/fgene.2022.982019. eCollection 2022.
6
Predicting clinical response to everolimus in ER+ breast cancers using machine-learning.使用机器学习预测雌激素受体阳性乳腺癌患者对依维莫司的临床反应。
Front Mol Biosci. 2022 Oct 11;9:981962. doi: 10.3389/fmolb.2022.981962. eCollection 2022.
7
Identification of biomarkers of response to preoperative talazoparib monotherapy in treatment naïve gBRCA+ breast cancers.在未经治疗的gBRCA+乳腺癌中,鉴定术前他拉唑帕尼单药治疗反应的生物标志物。
NPJ Breast Cancer. 2022 May 10;8(1):64. doi: 10.1038/s41523-022-00427-9.
8
Molecular Characterization and Prospective Evaluation of Pathologic Response and Outcomes with Neoadjuvant Therapy in Metaplastic Triple-Negative Breast Cancer.三阴性乳腺癌新辅助治疗的分子特征及病理反应和预后的前瞻性评估。
Clin Cancer Res. 2022 Jul 1;28(13):2878-2889. doi: 10.1158/1078-0432.CCR-21-3100.
9
ABCA1 Expression Is Upregulated in an EMT in Breast Cancer Cell Lines via MYC-Mediated De-Repression of Its Proximal Ebox Element.通过MYC介导的对其近端Ebox元件的去抑制作用,ABCA1在乳腺癌细胞系的上皮-间质转化中表达上调。
Biomedicines. 2022 Mar 2;10(3):581. doi: 10.3390/biomedicines10030581.
10
RAGE Inhibitors as Alternatives to Dexamethasone for Managing Cerebral Edema Following Brain Tumor Surgery.RAGE抑制剂作为脑肿瘤手术后治疗脑水肿的地塞米松替代药物
Neurotherapeutics. 2022 Mar;19(2):635-648. doi: 10.1007/s13311-022-01207-w. Epub 2022 Feb 28.

本文引用的文献

1
Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.评估九种体细胞变异检测工具在全外显子组测序和靶向深度测序数据中检测体细胞突变的性能
PLoS One. 2016 Mar 22;11(3):e0151664. doi: 10.1371/journal.pone.0151664. eCollection 2016.
2
Core services: Reward bioinformaticians.核心服务:奖励生物信息学家。
Nature. 2015 Apr 9;520(7546):151-2. doi: 10.1038/520151a.
3
Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers.在癌症基因组测序数据中检测体细胞点突变:突变调用程序的比较。
Genome Med. 2013 Oct 11;5(10):91. doi: 10.1186/gm495. eCollection 2013.
4
Snakemake--a scalable bioinformatics workflow engine.Snakemake——一个可扩展的生物信息学工作流引擎。
Bioinformatics. 2012 Oct 1;28(19):2520-2. doi: 10.1093/bioinformatics/bts480. Epub 2012 Aug 20.
5
SIGNATURE: a workbench for gene expression signature analysis.签名:一个用于基因表达特征分析的工作台。
BMC Bioinformatics. 2011 Nov 14;12:443. doi: 10.1186/1471-2105-12-443.
6
A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。
Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.
7
Ruffus: a lightweight Python library for computational pipelines.Ruffus:一个用于计算流水线的轻量级 Python 库。
Bioinformatics. 2010 Nov 1;26(21):2778-9. doi: 10.1093/bioinformatics/btq524. Epub 2010 Sep 16.
8
Tackling the widespread and critical impact of batch effects in high-throughput data.解决高通量数据中广泛存在且极具影响力的批次效应问题。
Nat Rev Genet. 2010 Oct;11(10):733-9. doi: 10.1038/nrg2825. Epub 2010 Sep 14.
9
Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Galaxy:一种支持生命科学领域可访问、可重现和透明计算研究的综合方法。
Genome Biol. 2010;11(8):R86. doi: 10.1186/gb-2010-11-8-r86. Epub 2010 Aug 25.
10
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.基因组分析工具包:一种用于分析下一代 DNA 测序数据的 MapReduce 框架。
Genome Res. 2010 Sep;20(9):1297-303. doi: 10.1101/gr.107524.110. Epub 2010 Jul 19.