Department of Digital Technologies, University of Mauritius, Reduit, Mauritius.
Australian Centre for Ancient DNA, University of Adelaide, Adelaide, South Australia, Australia.
BMC Bioinformatics. 2018 Nov 29;19(1):457. doi: 10.1186/s12859-018-2446-1.
The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging.
H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community.
The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
泛非生物信息学网络 H3ABioNet 由非洲 17 个国家的 27 个研究机构组成。H3ABioNet 是非洲主导的研究联盟人类健康与遗传在非洲(H3Africa)计划的一部分,该计划由美国国立卫生研究院和英国惠康信托基金会资助,旨在利用基因组学研究和改善非洲人的健康。H3ABioNet 的一个关键作用是通过构建生物信息学基础设施来支持 H3Africa 项目,例如可在异构非洲计算环境中使用的便携式和可重复使用的生物信息学工作流程。基因组数据的处理和分析是一个需要复杂相互依赖数据分析工作流程的大数据应用示例。这种生物信息学工作流程将主要和次要输入数据通过几个使用不同软件包的计算密集型处理步骤,并将一些输出作为其他步骤的输入。实现可扩展、可重复、可移植和易于使用的工作流程特别具有挑战性。
H3ABioNet 构建了四个工作流程来支持(1)从高通量测序数据中调用变体;(2)分析 16S rDNA 序列数据中的微生物种群;(3)基因分型和全基因组关联研究;(4)单核苷酸多态性推断。2016 年 8 月组织了为期一周的黑客马拉松活动,来自六个非洲生物信息学小组、美国和欧洲的合作者参加了活动。其中两个工作流程是使用通用工作流程语言框架(CWL)构建的,另外两个是使用 Nextflow 构建的。所有工作流程都使用 Docker 进行容器化,以提高可移植性和可重复性,并可供 H3Africa 联盟成员和国际研究界使用。
H3ABioNet 工作流程的实现考虑了为最终用户提供易用性,以及高度的可重复性和可移植性,同时遵循现代生物信息学数据处理协议的最新状态。H3ABioNet 工作流程将为 H3Africa 联盟项目提供服务,并已在使用中。所有四个工作流程也可供全球研究科学家使用和改编,以满足各自的需求。H3ABioNet 工作流程将有助于发展生物信息学能力,并协助非洲的基因组学研究,并有助于增加 H3Africa 及其泛非生物信息学网络的科学产出。