Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany.
BMC Bioinformatics. 2019 Jul 22;20(1):402. doi: 10.1186/s12859-019-2982-3.
Today a variety of phylogenetic file formats exists, some of which are well-established but limited in their data model, while other more recently introduced ones offer advanced features for metadata representation. Although most currently available software only supports the classical formats with a limited metadata model, it would be desirable to have support for the more advanced formats. This is necessary for users to produce richly annotated data that can be efficiently reused and make underlying workflows easily reproducible. A programming library that abstracts over the data and metadata models of the different formats and allows supporting all of them in one step would significantly simplify the development of new and the extension of existing software to address the need for better metadata annotation.
We developed the Java library JPhyloIO, which allows event-based reading and writing of the most common alignment and tree/network formats. It allows full access to all features of the nine currently supported formats. By implementing a single JPhyloIO-based reader and writer, application developers can support all of these formats. Due to the event-based architecture, JPhyloIO can be combined with any application data structure, and is memory efficient for large datasets. JPhyloIO is distributed under LGPL. Detailed documentation and example applications (available on http://bioinfweb.info/JPhyloIO/ ) significantly lower the entry barrier for bioinformaticians who wish to benefit from JPhyloIO's features in their own software.
JPhyloIO enables simplified development of new and extension of existing applications that support various standard formats simultaneously. This has the potential to improve interoperability between phylogenetic software tools and at the same time motivate usage of more recent metadata-rich formats such as NeXML or phyloXML.
如今存在多种系统发育文件格式,其中一些格式已经得到广泛应用,但数据模型有限;而其他一些较新的格式则提供了用于元数据表示的高级功能。尽管目前大多数可用软件仅支持具有有限元数据模型的经典格式,但最好能支持更高级的格式。这对于用户生成可高效重用并使底层工作流程易于重现的丰富注释数据来说是必需的。一个抽象不同格式的数据和元数据模型并允许一步支持所有格式的编程库,将极大地简化新软件的开发和现有软件的扩展,以满足更好的元数据注释的需求。
我们开发了 Java 库 JPhyloIO,它允许基于事件的读取和写入最常见的对齐和树/网络格式。它允许完全访问当前支持的九个格式的所有功能。通过实现基于单个 JPhyloIO 的读取器和写入器,应用程序开发人员可以支持所有这些格式。由于基于事件的架构,JPhyloIO 可以与任何应用程序数据结构结合使用,并且对于大型数据集来说内存效率很高。JPhyloIO 是根据 LGPL 发布的。详细的文档和示例应用程序(可在 http://bioinfweb.info/JPhyloIO/ 上获得)大大降低了希望在自己的软件中受益于 JPhyloIO 功能的生物信息学家的入门门槛。
JPhyloIO 使同时支持各种标准格式的新应用程序的开发和现有应用程序的扩展变得更加简单。这有可能提高系统发育软件工具之间的互操作性,同时鼓励使用更现代的富含元数据的格式,如 NeXML 或 phyloXML。