Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia.
Jožef Stefan International Postgraduate School, Ljubljana, Slovenia.
Sci Rep. 2022 May 4;12(1):7267. doi: 10.1038/s41598-022-11316-3.
Multilabel classification (MLC) is a machine learning task where the goal is to learn to label an example with multiple labels simultaneously. It receives increasing interest from the machine learning community, as evidenced by the increasing number of papers and methods that appear in the literature. Hence, ensuring proper, correct, robust, and trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved by adhering to the recently emerged data management standards, such as the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) principles. We introduce an ontology-based online catalogue of MLC datasets originating from various application domains following these principles. The catalogue extensively describes many MLC datasets with comprehensible meta-features, MLC-specific semantic descriptions, and different data provenance information. The MLC data catalogue is available at: http://semantichub.ijs.si/MLCdatasets .
多标签分类(MLC)是一项机器学习任务,其目标是学会同时对示例进行多个标签的标注。它受到了机器学习社区越来越多的关注,这反映在文献中出现的越来越多的论文和方法上。因此,确保适当、正确、稳健和值得信赖的基准测试对于该领域的进一步发展至关重要。我们相信,通过遵守最近出现的数据管理标准,如 FAIR(可发现、可访问、可互操作和可重用)和 TRUST(透明、责任、用户关注、可持续性和技术)原则,就可以实现这一目标。我们根据这些原则,引入了一个基于本体的 MLC 数据集在线目录,这些数据集来源于不同的应用领域。该目录详细描述了许多 MLC 数据集,包括可理解的元特征、MLC 特定的语义描述和不同的数据来源信息。MLC 数据集目录可在以下网址获取:http://semantichub.ijs.si/MLCdatasets。