• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于评估生物医学数据集质量和可信度的维纳斯评分。

The Venus score for the assessment of the quality and trustworthiness of biomedical datasets.

作者信息

Chicco Davide, Fabris Alessandro, Jurman Giuseppe

机构信息

Università di Milano-Bicocca & University of Toronto, Toronto, Canada.

Max Planck Institute for Security and Privacy, Bochum, Germany.

出版信息

BioData Min. 2025 Jan 9;18(1):1. doi: 10.1186/s13040-024-00412-x.

DOI:10.1186/s13040-024-00412-x
PMID:39780220
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11716409/
Abstract

Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets. Although generally useful, however, they are often incomplete and impractical. The guidelines of Datasheets for Datasets, in particular, are too numerous; the requirements of the Kaggle Dataset Usability Score focus on non-scientific requisites (for example, including a cover image); and the European Union Artificial Intelligence Act (EU AI Act) sets forth sparse and general data governance requirements, which we tailored to datasets for biomedical AI. Against this backdrop, we introduce our new Venus score to assess the data quality and trustworthiness of biomedical datasets. Our score ranges from 0 to 10 and consists of ten questions that anyone developing a bioinformatics, medical informatics, or cheminformatics dataset should answer before the release. In this study, we first describe the EU AI Act, Datasheets for Datasets, and the Kaggle Dataset Usability Score, presenting their requirements and their drawbacks. To do so, we reverse-engineer the weights of the influential Kaggle Score for the first time and report them in this study. We distill the most important data governance requirements into ten questions tailored to the biomedical domain, comprising the Venus score. We apply the Venus score to twelve datasets from multiple subdomains, including electronic health records, medical imaging, microarray and bulk RNA-seq gene expression, cheminformatics, physiologic electrogram signals, and medical text. Analyzing the results, we surface fine-grained strengths and weaknesses of popular datasets, as well as aggregate trends. Most notably, we find a widespread tendency to gloss over sources of data inaccuracy and noise, which may hinder the reliable exploitation of data and, consequently, research results. Overall, our results confirm the applicability and utility of the Venus score to assess the trustworthiness of biomedical data.

摘要

生物医学数据集是计算生物学和健康信息学项目的支柱,可以在多个在线数据平台上找到,也可以从湿实验室生物学家和医生那里获得。然而,这些数据集的质量和可信度有时可能很差,进而产生不良结果,这可能会伤害患者和数据主体。为了解决这个问题,政策制定者、研究人员和联盟提出了各种法规、指南和评分来评估数据集的质量并提高其可靠性。然而,尽管它们通常很有用,但往往不完整且不切实际。特别是,《数据集数据表》的指南过于繁多;Kaggle数据集可用性评分的要求侧重于非科学要求(例如,包括封面图片);欧盟人工智能法案(EU AI Act)提出的数据治理要求稀疏且笼统,我们针对生物医学人工智能的数据集进行了调整。在此背景下,我们引入了新的金星评分来评估生物医学数据集的数据质量和可信度。我们的评分范围从0到10,由十个问题组成,任何开发生物信息学、医学信息学或化学信息学数据集的人在发布之前都应该回答这些问题。在本研究中,我们首先描述了欧盟人工智能法案、《数据集数据表》和Kaggle数据集可用性评分,介绍了它们的要求和缺点。为此,我们首次反向设计了有影响力的Kaggle评分的权重,并在本研究中报告了这些权重。我们将最重要的数据治理要求提炼为十个针对生物医学领域的问题,构成了金星评分。我们将金星评分应用于来自多个子领域的十二个数据集,包括电子健康记录、医学成像、微阵列和批量RNA测序基因表达、化学信息学、生理心电图信号和医学文本。通过分析结果,我们揭示了流行数据集的细粒度优势和劣势以及总体趋势。最值得注意的是,我们发现普遍存在掩盖数据不准确和噪声来源的倾向,这可能会阻碍对数据的可靠利用,进而影响研究结果。总体而言,我们的结果证实了金星评分在评估生物医学数据可信度方面的适用性和实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aff/11716409/ae142c0d38e5/13040_2024_412_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aff/11716409/ae142c0d38e5/13040_2024_412_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5aff/11716409/ae142c0d38e5/13040_2024_412_Fig1_HTML.jpg

相似文献

1
The Venus score for the assessment of the quality and trustworthiness of biomedical datasets.用于评估生物医学数据集质量和可信度的维纳斯评分。
BioData Min. 2025 Jan 9;18(1):1. doi: 10.1186/s13040-024-00412-x.
2
How the EU AI Act Seeks to Establish an Epistemic Environment of Trust.欧盟人工智能法案如何寻求建立一个可信赖的认知环境。
Asian Bioeth Rev. 2024 Jun 24;16(3):345-372. doi: 10.1007/s41649-024-00304-6. eCollection 2024 Jul.
3
Trustworthy artificial intelligence and the European Union AI act: On the conflation of trustworthiness and acceptability of risk.可信人工智能与欧盟人工智能法案:论可信度与风险可接受性的 conflation(此处conflation可结合语境意译为“混淆”等,因无更多背景较难准确翻译,保留英文供进一步理解)
Regul Gov. 2024 Jan;18(1):3-32. doi: 10.1111/rego.12512. Epub 2023 Feb 6.
4
Trustworthy Artificial Intelligence in Dentistry: Learnings from the EU AI Act.口腔医学中的可信人工智能:来自欧盟人工智能法案的启示。
J Dent Res. 2024 Oct;103(11):1051-1056. doi: 10.1177/00220345241271160. Epub 2024 Sep 23.
5
Artificial intelligence for breast cancer detection and its health technology assessment: A scoping review.用于乳腺癌检测的人工智能及其健康技术评估:一项范围综述。
Comput Biol Med. 2025 Jan;184:109391. doi: 10.1016/j.compbiomed.2024.109391. Epub 2024 Nov 22.
6
Are current clinical studies on artificial intelligence-based medical devices comprehensive enough to support a full health technology assessment? A systematic review.基于人工智能的医疗器械的当前临床研究是否足够全面,足以支持全面的健康技术评估?系统评价。
Artif Intell Med. 2023 Jun;140:102547. doi: 10.1016/j.artmed.2023.102547. Epub 2023 Apr 23.
7
Lessons Learned From European Health Data Projects With Cancer Use Cases: Implementation of Health Standards and Internet of Things Semantic Interoperability.从欧洲癌症用例健康数据项目中吸取的经验教训:健康标准的实施与物联网语义互操作性
J Med Internet Res. 2025 Mar 24;27:e66273. doi: 10.2196/66273.
8
Piloting a Survey-Based Assessment of Transparency and Trustworthiness with Three Medical AI Tools.使用三款医学人工智能工具对透明度和可信度进行基于调查的评估试点。
Healthcare (Basel). 2022 Sep 30;10(10):1923. doi: 10.3390/healthcare10101923.
9
Deep learning-based image analysis in muscle histopathology using photo-realistic synthetic data.基于深度学习的肌肉组织病理学图像分析:使用逼真的合成数据
Commun Med (Lond). 2025 Mar 6;5(1):64. doi: 10.1038/s43856-025-00777-y.
10
Data Governance in Healthcare AI: Navigating the EU AI Act's Requirements.医疗人工智能中的数据治理:应对欧盟人工智能法案的要求
Stud Health Technol Inform. 2025 Apr 8;323:66-70. doi: 10.3233/SHTI250050.

本文引用的文献

1
Embryonic macrophages support endocrine commitment during human pancreatic differentiation.胚胎巨噬细胞在人胰腺分化过程中支持内分泌细胞的定型。
Cell Stem Cell. 2024 Nov 7;31(11):1591-1611.e8. doi: 10.1016/j.stem.2024.09.011. Epub 2024 Oct 14.
2
DREAMER: a computational framework to evaluate readiness of datasets for machine learning.DREAMER:一个用于评估数据集是否适用于机器学习的计算框架。
BMC Med Inform Decis Mak. 2024 Jun 4;24(1):152. doi: 10.1186/s12911-024-02544-w.
3
The MAIDA initiative: establishing a framework for global medical-imaging data sharing.
MAIDA倡议:建立全球医学影像数据共享框架
Lancet Digit Health. 2024 Jan;6(1):e6-e8. doi: 10.1016/S2589-7500(23)00222-4. Epub 2023 Nov 15.
4
Data Quality in Health Research: Integrative Literature Review.卫生研究中的数据质量:综合文献综述。
J Med Internet Res. 2023 Oct 31;25:e41446. doi: 10.2196/41446.
5
When seeing is not believing: application-appropriate validation matters for quantitative bioimage analysis.眼见不一定为实:定量生物图像分析中与应用相匹配的验证至关重要。
Nat Methods. 2023 Jul;20(7):968-970. doi: 10.1038/s41592-023-01881-4.
6
Ten quick tips for avoiding pitfalls in multi-omics data integration analyses.避免组学数据整合分析陷阱的 10 个快速技巧。
PLoS Comput Biol. 2023 Jul 6;19(7):e1011224. doi: 10.1371/journal.pcbi.1011224. eCollection 2023 Jul.
7
Electronic health record data quality assessment and tools: a systematic review.电子健康记录数据质量评估及工具:系统综述。
J Am Med Inform Assoc. 2023 Sep 25;30(10):1730-1740. doi: 10.1093/jamia/ocad120.
8
Ten simple rules for providing bioinformatics support within a hospital.在医院内提供生物信息学支持的十条简单规则。
BioData Min. 2023 Feb 23;16(1):6. doi: 10.1186/s13040-023-00326-0.
9
Addressing barriers in FAIR data practices for biomedical data.解决生物医学数据的公平数据实践中的障碍。
Sci Data. 2023 Feb 23;10(1):98. doi: 10.1038/s41597-023-01969-8.
10
A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis.人工智能用于检测数字乳腺断层合成图像中病变的竞赛、基准、代码和数据。
JAMA Netw Open. 2023 Feb 1;6(2):e230524. doi: 10.1001/jamanetworkopen.2023.0524.