Rupprecht Florian, Kai Jason, Shrestha Biraj, Giavasis Steven, Xu Ting, Glatard Tristan, Milham Michael P, Kiar Gregory
Center for Data Analytics, Innovation, and Rigor, Child Mind Institute, 215 East 50th Street, 10022, New York, USA.
Krembil Centre for Neuroinformatics, Centre for Addiction and Mental Health, 250 College Street, M5T 1R8, Toronto, Canada.
bioRxiv. 2025 Jul 30:2025.07.24.666435. doi: 10.1101/2025.07.24.666435.
In numerous scientific domains, established tools have often been developed with complex command-line interfaces. Such is the case for brain imaging and bioinformatics, making the use of powerful legacy tools in modern workflow paradigms challenging. We present (i) Styx, a compiler for generating language-native wrapper functions from static tool metadata, leading to seamless integration of command-line tools within the data science ecosystem. Alongside Styx, we have created (ii) NiWrap, a collection of more than 1900 neuroimaging command-line function descriptions as a proof-of-concept implementation. These interfaces, available in Python, R, and TypeScript (available at https://github.com/styx-api), significantly reduce the complexity of writing and interpreting software pipelines, particularly when composing workflows across packages with distinct API standards. The compiler architecture of Styx facilitates maintainability and portability across computing environments. As with all metadata-dependent infrastructure, creating sufficient metadata annotations remains a barrier to adoption. Accordingly, NiWrap demonstrates approaches that lower this barrier through direct source code extraction and LLM-assisted documentation parsing. Together, Styx and NiWrap offer a sustainable solution for interfacing diverse command-line tools with modern data science ecosystems. This modular approach enhances reproducibility and efficiency in pipeline development while ensuring portability across computing environments and programming languages.
在众多科学领域,已有的工具通常是通过复杂的命令行界面开发的。脑成像和生物信息学领域就是如此,这使得在现代工作流范式中使用强大的传统工具具有挑战性。我们展示了:(i)Styx,一种编译器,用于从静态工具元数据生成语言原生包装函数,从而实现命令行工具在数据科学生态系统中的无缝集成。除了Styx,我们还创建了(ii)NiWrap,它包含1900多个神经成像命令行函数描述,作为概念验证实现。这些接口以Python、R和TypeScript提供(可在https://github.com/styx-api获取),显著降低了编写和解释软件管道的复杂性,特别是在组合具有不同API标准的包的工作流时。Styx的编译器架构有助于跨计算环境的可维护性和可移植性。与所有依赖元数据的基础设施一样,创建足够的元数据注释仍然是采用的障碍。因此,NiWrap展示了通过直接提取源代码和由大型语言模型辅助的文档解析来降低这一障碍的方法。Styx和NiWrap共同为将各种命令行工具与现代数据科学生态系统连接提供了一个可持续的解决方案。这种模块化方法提高了管道开发中的可重复性和效率,同时确保了跨计算环境和编程语言的可移植性。