Attafi Omar Abdelghani, Clementel Damiano, Kyritsis Konstantinos, Capriotti Emidio, Farrell Gavin, Fragkouli Styliani-Christina, Castro Leyla Jael, Hatos András, Lenaerts Tom, Mazurenko Stanislav, Mozaffari Soroush, Pradelli Franco, Ruch Patrick, Savojardo Castrense, Turina Paola, Zambelli Federico, Piovesan Damiano, Monzon Alexander Miguel, Psomopoulos Fotis, Tosatto Silvio C E
Department of Biomedical Sciences, University of Padova, Padova 35131, Italy.
Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki 570 01, Greece.
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae094.
Supervised machine learning (ML) is used extensively in biology and deserves closer scrutiny. The Data Optimization Model Evaluation (DOME) recommendations aim to enhance the validation and reproducibility of ML research by establishing standards for key aspects such as data handling and processing, optimization, evaluation, and model interpretability. The recommendations help to ensure that key details are reported transparently by providing a structured set of questions. Here, we introduce the DOME registry (URL: registry.dome-ml.org), a database that allows scientists to manage and access comprehensive DOME-related information on published ML studies. The registry uses external resources like ORCID, APICURON, and the Data Stewardship Wizard to streamline the annotation process and ensure comprehensive documentation. By assigning unique identifiers and DOME scores to publications, the registry fosters a standardized evaluation of ML methods. Future plans include continuing to grow the registry through community curation, improving the DOME score definition and encouraging publishers to adopt DOME standards, and promoting transparency and reproducibility of ML in the life sciences.
监督式机器学习(ML)在生物学中被广泛使用,值得更深入的审视。数据优化模型评估(DOME)建议旨在通过为数据处理、优化、评估和模型可解释性等关键方面制定标准,来提高ML研究的验证性和可重复性。这些建议通过提供一组结构化问题,有助于确保关键细节得到透明报告。在此,我们介绍DOME注册库(网址:registry.dome-ml.org),这是一个数据库,允许科学家管理和获取已发表的ML研究中与DOME相关的全面信息。该注册库使用ORCID、APICURON和数据管理向导等外部资源来简化注释过程并确保全面记录。通过为出版物分配唯一标识符和DOME分数,该注册库促进了对ML方法的标准化评估。未来计划包括通过社区管理继续扩大注册库,改进DOME分数定义并鼓励出版商采用DOME标准,以及提高生命科学中ML的透明度和可重复性。