• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

变异基准数据集:更新、标准、质量和应用。

Variation benchmark datasets: update, criteria, quality and applications.

机构信息

Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden.

School of Computer Science and Technology, Soochow University, No1. Shizi Street, Suzhou, 215006 Jiangsu, China.

出版信息

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baz117.

DOI:10.1093/database/baz117
PMID:32016318
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6997940/
Abstract

Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench.

摘要

开发新的计算方法并测试其性能必须使用实验数据进行。只有与现有知识进行比较,才能评估方法的性能。为此,需要具有已知和经过验证结果的基准数据集。高质量的基准数据集是有价值的,并且可能难以生成、费力且耗时。VariBench 和 VariSNP 是两个现有的用于共享变异基准数据集的数据库,主要用于变异解释。它们已被用于训练和基准测试各种类型的变异及其影响的预测器。VariBench 已使用来自 109 篇论文的 419 个新数据集进行了更新,其中共包含 329014152 个变体;但是,数据集之间存在大量冗余。VariBench 可在 http://structure.bmc.lu.se/VariBench/ 上免费获得。数据集的内容取决于原始来源中的信息。可用数据集已分为 20 个组和子组。有用于插入和缺失、编码和非编码区替换、结构映射、同义和良性变体的数据集。特定于效应的数据集包括 DNA 调控元件、RNA 剪接以及用于聚集、结合自由能、无序和稳定性的蛋白质特性。然后还有几个用于分子特异性和疾病特异性应用的数据集,以及一个用于变异表型效应的数据集。变体通常在三个分子水平(DNA、RNA 和蛋白质)上进行描述,有时也在包括相关交叉引用和变体描述的蛋白质结构水平上进行描述。更新后的 VariBench 促进了新方法的开发和测试,并将获得的性能与以前发表的方法进行比较。我们将致病性/耐受性预测器 PON-P2 的性能与几项基准研究进行了比较,并表明这种比较是可行且有用的,但是,由于缺乏提供的细节和共享的数据,可能存在限制。数据库网址:http://structure.bmc.lu.se/VariBench。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b635/6997940/222d78f597d0/baz117f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b635/6997940/9bd8806f3ad4/baz117f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b635/6997940/222d78f597d0/baz117f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b635/6997940/9bd8806f3ad4/baz117f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b635/6997940/222d78f597d0/baz117f2.jpg

相似文献

1
Variation benchmark datasets: update, criteria, quality and applications.变异基准数据集:更新、标准、质量和应用。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baz117.
2
VariBench: a benchmark database for variations.VariBench:一个变异基准数据库。
Hum Mutat. 2013 Jan;34(1):42-9. doi: 10.1002/humu.22204. Epub 2012 Oct 11.
3
Representativeness of variation benchmark datasets.变异性基准数据集的代表性。
BMC Bioinformatics. 2018 Nov 29;19(1):461. doi: 10.1186/s12859-018-2478-6.
4
VariSNP, a benchmark database for variations from dbSNP.VariSNP,一个来自dbSNP变异的基准数据库。
Hum Mutat. 2015 Feb;36(2):161-6. doi: 10.1002/humu.22727. Epub 2015 Jan 8.
5
PON-P2: prediction method for fast and reliable identification of harmful variants.PON-P2:快速可靠识别有害变异的预测方法
PLoS One. 2015 Feb 3;10(2):e0117380. doi: 10.1371/journal.pone.0117380. eCollection 2015.
6
PON-Sol: prediction of effects of amino acid substitutions on protein solubility.PON-Sol:预测氨基酸取代对蛋白质溶解度的影响。
Bioinformatics. 2016 Jul 1;32(13):2032-4. doi: 10.1093/bioinformatics/btw066. Epub 2016 Feb 19.
7
How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis.如何评估预测方法的性能?变异效应分析中的度量及其解释。
BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2164-13-S4-S2.
8
Problems in variation interpretation guidelines and in their implementation in computational tools.变异解读指南及其在计算工具中的实施存在的问题。
Mol Genet Genomic Med. 2020 Sep;8(9):e1206. doi: 10.1002/mgg3.1206. Epub 2020 Mar 11.
9
PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality.PON-tstab:蛋白变体稳定性预测器。训练数据质量的重要性。
Int J Mol Sci. 2018 Mar 28;19(4):1009. doi: 10.3390/ijms19041009.
10
How good are pathogenicity predictors in detecting benign variants?致病性预测因子在检测良性变异方面有多准确?
PLoS Comput Biol. 2019 Feb 11;15(2):e1006481. doi: 10.1371/journal.pcbi.1006481. eCollection 2019 Feb.

引用本文的文献

1
Accurate, Scalable Structural Variant Genotyping in Complex Genomes at Population Scales.群体规模下复杂基因组中准确、可扩展的结构变异基因分型
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf180.
2
Benchmarking, detection, and genotyping of structural variants in a population of whole-genome assemblies using the SVGAP pipeline.使用SVGAP流程对全基因组组装群体中的结构变异进行基准测试、检测和基因分型。
bioRxiv. 2025 Feb 8:2025.02.07.637096. doi: 10.1101/2025.02.07.637096.
3
Exploring the Applications of Explainability in Wearable Data Analytics: Systematic Literature Review.

本文引用的文献

1
Benchmarking subcellular localization and variant tolerance predictors on membrane proteins.对膜蛋白的亚细胞定位和变体耐受性预测器进行基准测试。
BMC Genomics. 2019 Jul 16;20(Suppl 8):547. doi: 10.1186/s12864-019-5865-0.
2
Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants.评估胱硫醚-β-合酶变异体表型效应的计算预测。
Hum Mutat. 2019 Sep;40(9):1530-1545. doi: 10.1002/humu.23868. Epub 2019 Sep 3.
3
NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans.
探索可解释性在可穿戴数据分析中的应用:系统文献综述
J Med Internet Res. 2024 Dec 24;26:e53863. doi: 10.2196/53863.
4
AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes.基于人工智能的乳腺癌基因错义变异致病性预测工具性能的比较评估。
Hum Genomics. 2024 Sep 11;18(1):99. doi: 10.1186/s40246-024-00667-9.
5
Simulation Tests of Methods in Evolution, Ecology, and Systematics: Pitfalls, Progress, and Principles.进化、生态学与系统分类学方法的模拟测试:陷阱、进展与原则
Annu Rev Ecol Evol Syst. 2022 Nov;53(1):113-136. doi: 10.1146/annurev-ecolsys-102320-093722. Epub 2022 Jul 29.
6
Machine Learning-Guided Protein Engineering.机器学习引导的蛋白质工程
ACS Catal. 2023 Oct 13;13(21):13863-13895. doi: 10.1021/acscatal.3c02743. eCollection 2023 Nov 3.
7
Artificial Intelligence in Surgical Training for Kidney Cancer: A Systematic Review of the Literature.人工智能在肾癌手术培训中的应用:文献系统综述
Diagnostics (Basel). 2023 Sep 27;13(19):3070. doi: 10.3390/diagnostics13193070.
8
VariBench, new variation benchmark categories and data sets.VariBench,新的变异基准类别和数据集。
Front Bioinform. 2023 Sep 19;3:1248732. doi: 10.3389/fbinf.2023.1248732. eCollection 2023.
9
Updated benchmarking of variant effect predictors using deep mutational scanning.使用深度突变扫描对变异效应预测器进行更新的基准测试。
Mol Syst Biol. 2023 Aug 8;19(8):e11474. doi: 10.15252/msb.202211474. Epub 2023 Jun 13.
10
Computational approaches for predicting variant impact: An overview from resources, principles to applications.预测变异影响的计算方法:从资源、原理到应用的概述
Front Genet. 2022 Sep 29;13:981005. doi: 10.3389/fgene.2022.981005. eCollection 2022.
NCBoost 通过在人类中对净化选择信号进行监督学习,对孟德尔疾病中的致病性非编码变体进行分类。
Genome Biol. 2019 Feb 11;20(1):32. doi: 10.1186/s13059-019-1634-2.
4
How good are pathogenicity predictors in detecting benign variants?致病性预测因子在检测良性变异方面有多准确?
PLoS Comput Biol. 2019 Feb 11;15(2):e1006481. doi: 10.1371/journal.pcbi.1006481. eCollection 2019 Feb.
5
Computational identification of deleterious synonymous variants in human genomes using a feature-based approach.基于特征的方法计算人类基因组中有害同义变体的识别。
BMC Med Genomics. 2019 Jan 31;12(Suppl 1):12. doi: 10.1186/s12920-018-0455-6.
6
RefSeq curation and annotation of stop codon recoding in vertebrates.脊椎动物中终止密码子重编码的 RefSeq 注释和注释。
Nucleic Acids Res. 2019 Jan 25;47(2):594-606. doi: 10.1093/nar/gky1234.
7
ShapeGTB: the role of local DNA shape in prioritization of functional variants in human promoters with machine learning.ShapeGTB:局部DNA形状在利用机器学习对人类启动子中的功能变异进行优先级排序中的作用。
PeerJ. 2018 Nov 29;6:e5742. doi: 10.7717/peerj.5742. eCollection 2018.
8
Representativeness of variation benchmark datasets.变异性基准数据集的代表性。
BMC Bioinformatics. 2018 Nov 29;19(1):461. doi: 10.1186/s12859-018-2478-6.
9
dbCPM: a manually curated database for exploring the cancer passenger mutations.dbCPM:一个用于探索癌症乘客突变的人工整理数据库。
Brief Bioinform. 2020 Jan 17;21(1):309-317. doi: 10.1093/bib/bby105.
10
A Bayesian framework for efficient and accurate variant prediction.贝叶斯框架用于高效准确的变异预测。
PLoS One. 2018 Sep 13;13(9):e0203553. doi: 10.1371/journal.pone.0203553. eCollection 2018.