Vorbrugg Sebastian, Bezrukov Ilja, Bao Zhigui, Weigel Detlef
Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae755.
As genome graphs are powerful data structures for representing the genetic diversity within populations, they can help identify genomic variations that traditional linear references miss, but their complexity and size makes the analysis of genome graphs challenging. We sought to develop a genome graph analysis tool that helps these analyses to become more accessible by addressing the limitations of existing tools. Specifically, we improve scalability and user-friendliness, and we provide many new statistics tailored to variation graphs for graph evaluation, including sample-specific features.
We developed an efficient, comprehensive, and integrated tool, gretl, to analyze genome graphs and gain insights into their structure and composition by providing a wide range of statistics. gretl can be utilized to evaluate different graphs, compare the output of graph construction pipelines with different parameters, as well as perform an in-depth analysis of individual graphs, including sample-specific analysis. With the assistance of gretl, novel patterns of genetic variation and potential regions of interest can be identified, for later, more detailed inspection. We demonstrate that gretl outperforms other tools in terms of speed, particularly for larger genome graphs.
Commented Rust source code and documentation is available under MIT license at https://github.com/MoinSebi/gretl together with Python scripts and step-by-step usage examples. The package is available at Bioconda for easy installation.
由于基因组图是用于表示群体内遗传多样性的强大数据结构,它们有助于识别传统线性参考遗漏的基因组变异,但其复杂性和规模使得基因组图的分析具有挑战性。我们试图开发一种基因组图分析工具,通过解决现有工具的局限性,使这些分析更容易进行。具体而言,我们提高了可扩展性和用户友好性,并提供了许多针对变异图量身定制的新统计数据,用于图评估,包括样本特异性特征。
我们开发了一个高效、全面且集成的工具gretl,通过提供广泛的统计数据来分析基因组图,并深入了解其结构和组成。gretl可用于评估不同的图,比较具有不同参数的图构建管道的输出,以及对单个图进行深入分析,包括样本特异性分析。在gretl的帮助下,可以识别新的遗传变异模式和潜在的感兴趣区域,以便稍后进行更详细的检查。我们证明,gretl在速度方面优于其他工具,特别是对于更大的基因组图。
带有注释的Rust源代码和文档在MIT许可下可从https://github.com/MoinSebi/gretl获得,同时还有Python脚本和逐步使用示例。该软件包可在Bioconda上获取,便于安装。