Division of Pediatric Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
Department of Pediatrics, Perelman College of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
Genome Biol. 2020 Mar 5;21(1):58. doi: 10.1186/s13059-020-01965-w.
To understand diversity in enormous collections of genome sequences, we need computationally scalable tools that can quickly contextualize individual genomes based on their similarities and identify features of each genome that make them unique. We present WhatsGNU, a tool based on exact match proteomic compression that, in seconds, classifies any new genome and provides a detailed report of protein alleles that may have novel functional differences. We use this technique to characterize the total allelic diversity (panallelome) of Salmonella enterica, Mycobacterium tuberculosis, Pseudomonas aeruginosa, and Staphylococcus aureus. It could be extended to others. WhatsGNU is available from https://github.com/ahmedmagds/WhatsGNU.
为了理解大量基因组序列中的多样性,我们需要计算上可扩展的工具,这些工具可以根据它们的相似性快速为每个基因组提供上下文,并识别使它们独一无二的每个基因组的特征。我们提出了 WhatsGNU,这是一种基于精确匹配蛋白质组压缩的工具,它可以在几秒钟内对任何新的基因组进行分类,并提供一份详细的报告,说明可能具有新的功能差异的蛋白质等位基因。我们使用这种技术来描述沙门氏菌、结核分枝杆菌、铜绿假单胞菌和金黄色葡萄球菌的总等位基因多样性(panallelome)。它可以扩展到其他物种。WhatsGNU 可从 https://github.com/ahmedmagds/WhatsGNU 获取。