CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France.
National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark.
Gigascience. 2021 Jan 27;10(1). doi: 10.1093/gigascience/giaa157.
Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources.
Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability.
biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.
生命科学家经常面临大规模且异构的数据分析任务,必须在网络可访问资源的丛林中找到并访问最合适的数据库或软件。用于描述生命科学数字资源的信息多样性是其利用的障碍。尽管正在出现一些标准化工作,但没有任何信息模式足够详细,无法实现生物信息学资源的统一语义和语法描述和编目。
在这里,我们描述了 biotoolsSchema,这是一种形式化的信息模型,它在快速采用的简洁性需求与提供丰富技术信息和科学上下文之间取得平衡。biotoolsSchema 源自一系列社区驱动的研讨会,并部署在 bio.tools 注册中心,为科学界提供了 >17000 个可机读和可理解的软件和其他数字生命科学资源的描述。我们比较了我们的方法与相关倡议,并提供了对齐以促进互操作性和可重用性。
biotoolsSchema 支持生物信息学资源的语法和语义的形式化、严格和一致的规范,并支持诸如 bio.tools 等编目工作,帮助科学家找到、理解和比较资源。bio.tools 中 biotoolsSchema 的使用促进了研究软件的 FAIRness,这是数据密集型科学开放和可重复发展的关键要素。