Sun Binhuan, Pashkova Liubov, Pieters Pascal Aldo, Harke Archana Sanjay, Mohite Omkar Satyavan, Santos Alberto, Zielinski Daniel C, Palsson Bernhard O, Phaneuf Patrick Victor
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220 Søltofts Plads, 2800 Kongens, Lyngby, Denmark.
Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States.
Nucleic Acids Res. 2025 Jan 6;53(D1):D806-D818. doi: 10.1093/nar/gkae1042.
The exponential growth of microbial genome data presents unprecedented opportunities for unlocking the potential of microorganisms. The burgeoning field of pangenomics offers a framework for extracting insights from this big biological data. Recent advances in microbial pangenomic research have generated substantial data and literature, yielding valuable knowledge across diverse microbial species. PanKB (pankb.org), a knowledgebase designed for microbial pangenomics research and biotechnological applications, was built to capitalize on this wealth of information. PanKB currently includes 51 pangenomes from 8 industrially relevant microbial families, comprising 8402 genomes, over 500 000 genes and over 7M mutations. To describe this data, PanKB implements four main components: (1) Interactive pangenomic analytics to facilitate exploration, intuition, and potential discoveries; (2) Alleleomic analytics, a pangenomic-scale analysis of variants, providing insights into intra-species sequence variation and potential mutations for applications; (3) A global search function enabling broad and deep investigations across pangenomes to power research and bioengineering workflows; (4) A bibliome of 833 open-access pangenomic papers and an interface with an LLM that can answer in-depth questions using its knowledge. PanKB empowers researchers and bioengineers to harness the potential of microbial pangenomics and serves as a valuable resource bridging the gap between pangenomic data and practical applications.
微生物基因组数据的指数级增长为挖掘微生物的潜力带来了前所未有的机遇。蓬勃发展的泛基因组学领域提供了一个从这一庞大生物数据中提取见解的框架。微生物泛基因组研究的最新进展产生了大量数据和文献,为各种微生物物种带来了宝贵的知识。PanKB(pankb.org)是一个为微生物泛基因组学研究和生物技术应用设计的知识库,旨在利用这些丰富的信息。PanKB目前包括来自8个与工业相关的微生物家族的51个泛基因组,涵盖8402个基因组、超过50万个基因和超过700万个突变。为了描述这些数据,PanKB实现了四个主要组件:(1)交互式泛基因组分析,以促进探索、直观理解和潜在发现;(2)等位基因组分析,一种对变异进行泛基因组规模的分析,为物种内序列变异和应用中的潜在突变提供见解;(3)全局搜索功能,能够对泛基因组进行广泛而深入的研究,为研究和生物工程工作流程提供支持;(4)一个包含833篇开放获取泛基因组学论文的文献库以及一个与大型语言模型的接口,该接口可以利用其知识回答深入问题。PanKB使研究人员和生物工程师能够利用微生物泛基因组学的潜力,并作为弥合泛基因组数据与实际应用之间差距的宝贵资源。