Chen Xiaoling, Chang Jeffrey T
School of Biomedical Informatics.
Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Bioinformatics. 2017 Apr 15;33(8):1210-1215. doi: 10.1093/bioinformatics/btw817.
Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used.
To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise.
https://github.com/jefftc/changlab.
由于处理数据所需的步骤越来越多,以及每个步骤中可使用的方法激增,生物信息学分析正变得极其复杂。为了缓解这一困难,通常会采用工作流程。然而,工作流程通常是为了自动化特定分析而实施的,因此难以用于需要对所使用的软件或参数进行系统更改的探索性分析。
为了实现工作流程开发的自动化,我们研究了专家系统。我们创建了生物信息学专家系统(BETSY),它包括一个知识库,其中生物信息学软件的功能被明确且正式地编码。BETSY是一个基于反向链规则的专家系统,由一个能够捕捉生物数据丰富性的数据模型和一个在知识库上进行推理以生成工作流程的推理引擎组成。目前,知识库中填充了用于分析微阵列和下一代测序数据的规则。我们对BETSY进行了评估,发现它可以生成能够重现并超越先前发表的生物信息学结果的工作流程。最后,对从知识库生成的工作流程进行的元调查产生了生物信息学分析每个步骤所带来的技术负担的定量度量,揭示了大量用于数据预处理的步骤。总之,专家系统方法可以通过自动化工作流程的开发来促进探索性生物信息学分析,而这一任务需要大量的领域专业知识。
https://github.com/jefftc/changlab。