Dommer Jennifer, Van Doorslaer Koenraad, Afrasiabi Cyrus, Browne Kristen, Ezeji Sam, Kim Lewis, Dolan Michael, McBride Alison A
Bioinformatics and Computational Biosciences Branch (BCBB), National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA.
Department of Immunobiology, College of Medicine, BIO5 Institute, University of Arizona, Tucson, AZ, USA.
J Mol Biol. 2025 Aug 1;437(15):168925. doi: 10.1016/j.jmb.2024.168925. Epub 2024 Dec 26.
The Papilloma Virus Episteme (PaVE) https://pave.niaid.nih.gov/ was initiated by NIAID in 2008 to provide a highly curated bioinformatic and knowledge resource for the papillomavirus scientific community. It rapidly became the fundamental and core resource for papillomavirus researchers and clinicians worldwide. Over time, the software infrastructure became severely outdated. In PaVE 2.0, the underlying libraries and hosting platform have been completely upgraded and rebuilt using Amazon Web Services (AWS) tools and automated CI/CD (continuous integration and deployment) pipelines for deployment of the application and data (now in AWS S3 cloud storage). PaVE 2.0 is hosted on three AWS ECS (elastic container service) using the NIAID Operations & Engineering Branch's Monarch tech stack and terraform. A new Celery queue supports longer running tasks. The framework is Python Flask with a JavaScript/JINJA template front end, and the database switched from MySQL to Neo4j. A Swagger API (Application Programming Interface) performs database queries, and executes jobs for BLAST, MAFFT, and the L1 typing tooland will allow future programmatic data access. All major tools such as BLAST, the L1 typing tool, genome locus viewer, phylogenetic tree generator, multiple sequence alignment, and protein structure viewer were modernized and enhanced to support more users. Multiple sequence alignment uses MAFFT instead of COBALT. The protein structure viewer was changed from Jmol to Mol*, the new embeddable viewer used by RCSB (Research Collaboratory for Structural Bioinformatics). In summary, PaVE 2.0 allows us to continue to provide this essential resource with an open-source framework that could be used as a template for molecular biology databases of other viruses.
乳头瘤病毒知识库(PaVE)(https://pave.niaid.nih.gov/)由美国国立过敏和传染病研究所(NIAID)于2008年发起,旨在为乳头瘤病毒科学界提供一个经过高度整理的生物信息学和知识资源。它迅速成为全球乳头瘤病毒研究人员和临床医生的基础和核心资源。随着时间的推移,软件基础设施严重过时。在PaVE 2.0中,底层库和托管平台已使用亚马逊网络服务(AWS)工具和自动化CI/CD(持续集成和部署)管道进行了全面升级和重建,用于应用程序和数据的部署(现在存储在AWS S3云存储中)。PaVE 2.0使用NIAID运营与工程部门的Monarch技术栈和Terraform,托管在三个AWS ECS(弹性容器服务)上。一个新的Celery队列支持运行时间更长的任务。该框架是带有JavaScript/JINJA模板前端的Python Flask,数据库从MySQL切换到了Neo4j。一个Swagger API(应用程序编程接口)执行数据库查询,并为BLAST、MAFFT和L1分型工具执行作业,还将允许未来进行编程式数据访问。所有主要工具,如BLAST、L1分型工具、基因组位点查看器、系统发育树生成器多重序列比对和蛋白质结构查看器都进行了现代化改造和增强,以支持更多用户。多重序列比对使用MAFFT而不是COBALT。蛋白质结构查看器从Jmol改为Mol*,这是结构生物信息学研究合作实验室(RCSB)使用的新的可嵌入查看器。总之,PaVE 2.0使我们能够继续通过一个开源框架提供这一重要资源,该框架可作为其他病毒分子生物学数据库的模板。