Suppr超能文献

泛基因组工具:泛基因组数据的表示、存储与探索

PanTools: representation, storage and exploration of pan-genomic data.

作者信息

Sheikhizadeh Siavash, Schranz M Eric, Akdel Mehmet, de Ridder Dick, Smit Sandra

机构信息

Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands.

Biosystematics Group, Wageningen University, Droevendaalsesteeg 1, 6708PB, The Netherlands.

出版信息

Bioinformatics. 2016 Sep 1;32(17):i487-i493. doi: 10.1093/bioinformatics/btw455.

Abstract

MOTIVATION

Next-generation sequencing technology is generating a wealth of highly similar genome sequences for many species, paving the way for a transition from single-genome to pan-genome analyses. Accordingly, genomics research is going to switch from reference-centric to pan-genomic approaches. We define the pan-genome as a comprehensive representation of multiple annotated genomes, facilitating analyses on the similarity and divergence of the constituent genomes at the nucleotide, gene and genome structure level. Current pan-genomic approaches do not thoroughly address scalability, functionality and usability.

RESULTS

We introduce a generalized De Bruijn graph as a pan-genome representation, as well as an online algorithm to construct it. This representation is stored in a Neo4j graph database, which makes our approach scalable to large eukaryotic genomes. Besides the construction algorithm, our software package, called PanTools, currently provides functionality for annotating pan-genomes, adding sequences, grouping genes, retrieving gene sequences or genomic regions, reconstructing genomes and comparing and querying pan-genomes. We demonstrate the performance of the tool using datasets of 62 E. coli genomes, 93 yeast genomes and 19 Arabidopsis thaliana genomes.

AVAILABILITY AND IMPLEMENTATION

The Java implementation of PanTools is publicly available at http://www.bif.wur.nl

CONTACT

sandra.smit@wur.nl.

摘要

动机

新一代测序技术正在为许多物种生成大量高度相似的基因组序列,为从单基因组分析向泛基因组分析的转变铺平了道路。因此,基因组学研究即将从以参考基因组为中心的方法转向泛基因组方法。我们将泛基因组定义为多个注释基因组的全面表示,便于在核苷酸、基因和基因组结构水平上分析组成基因组的相似性和差异性。当前的泛基因组方法并未全面解决可扩展性、功能性和可用性问题。

结果

我们引入了一种广义的德布鲁因图作为泛基因组的表示形式,并介绍了一种构建它的在线算法。这种表示形式存储在一个Neo4j图形数据库中,这使得我们的方法能够扩展到大型真核生物基因组。除了构建算法外,我们名为PanTools的软件包目前还提供了注释泛基因组、添加序列、对基因进行分组、检索基因序列或基因组区域、重建基因组以及比较和查询泛基因组等功能。我们使用62个大肠杆菌基因组、93个酵母基因组和19个拟南芥基因组的数据集展示了该工具的性能。

可用性与实现

PanTools的Java实现可在http://www.bif.wur.nl上公开获取。

联系方式

sandra.smit@wur.nl

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验