Department of Mathematical, Physical and Computer Sciences, University of Parma, Parco Area delle Scienze 53/a (Campus), Parma, 43124, PR, Italy.
Department of Computer Science, University of Verona, Strada le Grazie, 15, Verona, 37134, VR, Italy.
J Biomed Inform. 2023 Dec;148:104552. doi: 10.1016/j.jbi.2023.104552. Epub 2023 Nov 22.
Pangenomics was originally defined as the problem of comparing the composition of genes into gene families within a set of bacterial isolates belonging to the same species. The problem requires the calculation of sequence homology among such genes. When combined with metagenomics, namely for human microbiome composition analysis, gene-oriented pangenome detection becomes a promising method to decipher ecosystem functions and population-level evolution. Established computational tools are able to investigate the genetic content of isolates for which a complete genomic sequence is available. However, there is a plethora of incomplete genomes that are available on public resources, which only a few tools may analyze. Incomplete means that the process for reconstructing their genomic sequence is not complete, and only fragments of their sequence are currently available. However, the information contained in these fragments may play an essential role in the analyses. Here, we present PanDelos-frags, a computational tool which exploits and extends previous results in analyzing complete genomes. It provides a new methodology for inferring missing genetic information and thus for managing incomplete genomes. PanDelos-frags outperforms state-of-the-art approaches in reconstructing gene families in synthetic benchmarks and in a real use case of metagenomics. PanDelos-frags is publicly available at https://github.com/InfOmics/PanDelos-frags.
泛基因组学最初被定义为比较同一物种的一组细菌分离物中基因家族的组成问题。该问题需要计算这些基因之间的序列同源性。当与宏基因组学(即人类微生物组组成分析)结合时,面向基因的泛基因组检测成为一种有前途的方法,可以揭示生态系统功能和种群水平的进化。现有的计算工具能够研究具有完整基因组序列的分离物的遗传内容。然而,有大量的不完整基因组可在公共资源上获得,而仅有少数工具可以分析。不完整意味着重建其基因组序列的过程不完整,目前仅提供其序列的片段。然而,这些片段中包含的信息可能在分析中发挥重要作用。在这里,我们提出了 PanDelos-frags,这是一种计算工具,它利用和扩展了以前在分析完整基因组方面的结果。它提供了一种新的方法来推断缺失的遗传信息,从而管理不完整的基因组。PanDelos-frags 在合成基准和实际宏基因组学用例中重建基因家族方面的性能优于最新方法。PanDelos-frags 可在 https://github.com/InfOmics/PanDelos-frags 上获得。