Shao Danying, Kellogg Gretta, Mahony Shaun, Lai William, Pugh B Franklin
Pennsylvania State University, University Park, Pennsylvania.
PEARC20 (2020). 2020 Jul;2020:285-292. doi: 10.1145/3311790.3396621. Epub 2020 Jul 26.
There has been a rapid development in genome sequencing, including high-throughput next generation sequencing (NGS) technologies, automation in biological experiments, new bioinformatics tools and utilization of high-performance computing and cloud computing. ChIP-based NGS technologies, e.g. ChIP-seq and ChIP-exo, are widely used to detect the binding sites of DNA-interacting proteins in the genome and help us to have a deeper mechanistic understanding of genomic regulation. As sequencing data is generated at an unprecedented pace from the ChIP-based NGS pipelines, there is an urgent need for a metadata management system. To meet this need, we developed the Platform for Eukaryotic Genomic Regulation (PEGR), a web service platform that logs metadata for samples and sequencing experiments, manages the data processing workflows, and provides reporting and visualization. PEGR links together people, samples, protocols, DNA sequencers and bioinformatics computation. With the help of PEGR, scientists can have a more integrated understanding of the sequencing data and better understand the scientific mechanisms of genomic regulation. In this paper, we present the architecture and the major functionalities of PEGR. We also share our experience in developing this application and discuss the future directions.
基因组测序技术发展迅速,包括高通量下一代测序(NGS)技术、生物实验自动化、新型生物信息学工具以及高性能计算和云计算的应用。基于染色质免疫沉淀(ChIP)的NGS技术,如ChIP-seq和ChIP-exo,被广泛用于检测基因组中与DNA相互作用蛋白的结合位点,有助于我们更深入地了解基因组调控的机制。由于基于ChIP的NGS流程以前所未有的速度产生测序数据,因此迫切需要一个元数据管理系统。为满足这一需求,我们开发了真核基因组调控平台(PEGR),这是一个网络服务平台,用于记录样本和测序实验的元数据,管理数据处理工作流程,并提供报告和可视化功能。PEGR将人员、样本、实验方案、DNA测序仪和生物信息学计算联系在一起。借助PEGR,科学家们能够更全面地理解测序数据,并更好地理解基因组调控的科学机制。在本文中,我们介绍了PEGR的架构和主要功能。我们还分享了开发此应用程序的经验,并讨论了未来的发展方向。