Suppr
超能文献

迈向机器 FAIR：通过机器来表示软件和数据集，以促进其重复利用和科学发现。

Towards Machine-FAIR: Representing software and datasets to facilitate reuse and scientific discovery by machines.

机构信息

Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA.

Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA.

出版信息

J Biomed Inform. 2024 Jun;154:104647. doi: 10.1016/j.jbi.2024.104647. Epub 2024 Apr 30.

DOI:10.1016/j.jbi.2024.104647

PMID:38692465

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11250896/

Abstract

OBJECTIVE

To use software, datasets, and data formats in the domain of Infectious Disease Epidemiology as a test collection to evaluate a novel M1 use case, which we introduce in this paper. M1 is a machine that upon receipt of a new digital object of research exhaustively finds all valid compositions of it with existing objects.

METHOD

We implemented a data-format-matching-only M1 using exhaustive search, which we refer to as M1. We then ran M1 on the test collection and used error analysis to identify needed semantic constraints.

RESULTS

Precision of M1 search was 61.7%. Error analysis identified needed semantic constraints and needed changes in handling of data services. Most semantic constraints were simple, but one data format was sufficiently complex to be practically impossible to represent semantic constraints over, from which we conclude limitatively that software developers will have to meet the machines halfway by engineering software whose inputs are sufficiently simple that their semantic constraints can be represented, akin to the simple APIs of services. We summarize these insights as M1-FAIR guiding principles for composability and suggest a roadmap for progressively capable devices in the service of reuse and accelerated scientific discovery.

CONCLUSION

Algorithmic search of digital repositories for valid workflow compositions has potential to accelerate scientific discovery but requires a scalable solution to the problem of knowledge acquisition about semantic constraints on software inputs. Additionally, practical limitations on the logical complexity of semantic constraints must be respected, which has implications for the design of software.

摘要

目的

使用传染病流行病学领域的软件、数据集和数据格式作为测试集，评估我们在本文中介绍的新型 M1 用例。M1 是一种机器，在收到新的研究数字对象后，它会穷尽地找到与其现有对象相匹配的所有有效组合。

方法

我们使用穷举搜索实现了仅数据格式匹配的 M1，我们称之为 M1。然后，我们在测试集中运行 M1，并使用错误分析来确定所需的语义约束。

结果

M1 搜索的精度为 61.7%。错误分析确定了所需的语义约束和数据服务处理方式的改变。大多数语义约束都很简单，但有一种数据格式足够复杂，以至于实际上不可能对其语义约束进行表示，这使我们得出结论，即软件开发人员将不得不通过设计其输入足够简单的软件来满足机器的要求，这些软件的语义约束可以表示出来，类似于服务的简单 API。我们将这些见解总结为 M1-FAIR 可组合性指导原则，并为可重复使用和加速科学发现的更具能力的设备提出了一个路线图。