Chin Pei-Ju, Bhavsar Jaysheel D, Bosma Trent J, MacDonald Madolyn L, Polson Shawn W, Khan Arifa S
Division of Viral Products, Office of Vaccines Research and Review, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, Maryland, USA.
Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, USA.
mSphere. 2025 Jul 29;10(7):e0028625. doi: 10.1128/msphere.00286-25. Epub 2025 Jun 23.
All biological products are required to demonstrate the absence of adventitious viruses (AVs), which may be inadvertently introduced at different steps involved in the manufacturing process. The currently recommended and virus detection assays have limitations for broad detection and are lengthy and laborious. Additionally, the use of animals is discouraged by the global 3 R's initiative for replacement, reduction, and refinement. High-throughput or next-generation sequencing (HTS/NGS) technologies can rapidly detect known and novel viruses in biological materials. There are, however, challenges for HTS detection of AVs due to differential abundance of viral sequences in public databases, which led to the creation of a non-redundant, Reference Viral Database (RVDB) containing all viral, viral-like, and viral-related sequences, with a reduced cellular sequence content. In this paper, we describe improvements in RVDB, which include the transition of RVDB production scripts from the original Python 2 to Python 3 codebase, updating the semantic pipeline to remove misannotated non-viral sequences and irrelevant viral sequences, use of taxonomy for the removal of phages, and inclusion of a quality-check step for SARS-CoV-2 genomes to exclude low-quality sequences. Additionally, RVDB website updates include search tools for exploring the database sequences and implementation of an automatic pipeline for providing annotation information to distinguish non-viral and viral sequences in the database. These updates for refining RVDB are expected to enhance HTS bioinformatics by reducing the computational time and increasing the accuracy for virus detection.IMPORTANCEHigh-throughput sequencing (HTS) has emerged as an advanced technology for demonstrating the safety of biological products. HTS can be used as an alternative adventitious virus detection method for replacing the currently recommended and PCR assays and supplementing or replacing the cell culture assays. However, HTS bioinformatics analysis for broad virus detection, including both known and novel viruses, depends on using a comprehensive and accurately annotated database. In this study, we have refined our original comprehensive Reference Virus Database (RVDB) for greater accuracy of virus detection with a reduced computational burden. Additionally, the production script for automating the generation of RVDB was updated to facilitate reliable database production and timely availability.
所有生物制品都必须证明不存在外源病毒(AVs),这些病毒可能在生产过程的不同步骤中被无意引入。目前推荐的病毒检测方法在广泛检测方面存在局限性,且耗时费力。此外,全球“3R”倡议(替代、减少和优化)不鼓励使用动物。高通量或下一代测序(HTS/NGS)技术可以快速检测生物材料中的已知和新型病毒。然而,由于公共数据库中病毒序列丰度不同,HTS检测AVs存在挑战,这促使创建了一个非冗余的参考病毒数据库(RVDB),其中包含所有病毒、类病毒和病毒相关序列,同时减少了细胞序列含量。在本文中,我们描述了RVDB的改进,包括将RVDB生产脚本从原来的Python 2过渡到Python 3代码库,更新语义管道以去除错误注释的非病毒序列和不相关的病毒序列,使用分类法去除噬菌体,并为严重急性呼吸综合征冠状病毒2(SARS-CoV-2)基因组纳入质量检查步骤以排除低质量序列。此外,RVDB网站更新包括用于探索数据库序列的搜索工具,以及实施自动管道以提供注释信息,以区分数据库中的非病毒和病毒序列。这些用于完善RVDB的更新预计将通过减少计算时间和提高病毒检测准确性来增强HTS生物信息学。
重要性
高通量测序(HTS)已成为证明生物制品安全性的先进技术。HTS可作为替代外源病毒检测方法,用于取代目前推荐的方法和聚合酶链反应(PCR)检测,并补充或取代细胞培养检测。然而,用于广泛病毒检测(包括已知和新型病毒)的HTS生物信息学分析依赖于使用全面且注释准确的数据库。在本研究中,我们完善了原有的综合参考病毒数据库(RVDB),以提高病毒检测的准确性并减轻计算负担。此外,更新了用于自动生成RVDB的生产脚本,以促进可靠的数据库生产和及时可用性。