Suppr超能文献

利用公共访问和本地网络服务进行系统发育树分析的集成自动工作流程

Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

作者信息

Damkliang Kasikrit, Tandayya Pichaya, Sangket Unitsa, Pasomsub Ekawat

出版信息

J Integr Bioinform. 2016 Nov 28;13(1):287. doi: 10.2390/biecoll-jib-2016-287.

Abstract

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

摘要

目前,编码序列(CDS)已被发现,并且越来越大的CDS也不断被揭示出来。同时,相关方法和工具也得到了开发与升级,尤其是在系统发育树分析方面。本文提出了一种集成的自动Taverna工作流,用于使用欧洲生物信息学研究所(EMBL-EBI)和瑞士生物信息学研究所(SIB)的公共访问网络服务以及我们自己部署的本地网络服务进行系统发育树推断分析。工作流的输入是一组Fasta格式的CDS。该工作流在自展复制中支持1000到20000个数据量。工作流基于我们提出的多序列比对(MSA)相似性得分,执行诸如简约法(PARS)、距离矩阵 - 邻接法(DIST-NJ)以及EMBOSS PHYLIPNEW软件包的最大似然法(ML)算法等树推断操作。本地网络服务使用Soaplab2和Apache Axis2部署实现并分为两种类型。有SOAP和Java网络服务(JWS)为工作流管理器Taverna Workbench提供WSDL端点。该工作流已经过验证、性能已被测量且结果已得到核实。对于推断具有10000次自展数据量重复的树,我们的工作流执行时间不到十分钟。本文提出了一种新的集成自动工作流,这将对具有中级知识和经验水平的生物信息学家有益。所有本地服务都已部署在我们的门户网站http://bioservices.sci.psu.ac.th上。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验