• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于表格基础模型对小数据进行准确预测。

Accurate predictions on small data with a tabular foundation model.

作者信息

Hollmann Noah, Müller Samuel, Purucker Lennart, Krishnakumar Arjun, Körfer Max, Hoo Shi Bin, Schirrmeister Robin Tibor, Hutter Frank

机构信息

Machine Learning Lab, University of Freiburg, Freiburg, Germany.

Computational Medicine, Berlin Institute of Health at Charité, Universitätsmedizin Berlin, Berlin, Germany.

出版信息

Nature. 2025 Jan;637(8045):319-326. doi: 10.1038/s41586-024-08328-6. Epub 2025 Jan 8.

DOI:10.1038/s41586-024-08328-6
PMID:39780007
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11711098/
Abstract

Tabular data, spreadsheets organized in rows and columns, are ubiquitous across scientific fields, from biomedicine to particle physics to economics and climate science. The fundamental prediction task of filling in missing values of a label column based on the rest of the columns is essential for various applications as diverse as biomedical risk models, drug discovery and materials science. Although deep learning has revolutionized learning from raw data and led to numerous high-profile success stories, gradient-boosted decision trees have dominated tabular data for the past 20 years. Here we present the Tabular Prior-data Fitted Network (TabPFN), a tabular foundation model that outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time. In 2.8 s, TabPFN outperforms an ensemble of the strongest baselines tuned for 4 h in a classification setting. As a generative transformer-based foundation model, this model also allows fine-tuning, data generation, density estimation and learning reusable embeddings. TabPFN is a learning algorithm that is itself learned across millions of synthetic datasets, demonstrating the power of this approach for algorithm development. By improving modelling abilities across diverse fields, TabPFN has the potential to accelerate scientific discovery and enhance important decision-making in various domains.

摘要

表格数据,即按行和列组织的电子表格,在从生物医学到粒子物理、再到经济学和气候科学等各个科学领域中无处不在。基于其他列填充标签列中缺失值的基本预测任务,对于生物医学风险模型、药物发现和材料科学等各种不同的应用来说至关重要。尽管深度学习彻底改变了从原始数据中学习的方式,并带来了众多备受瞩目的成功案例,但在过去20年里,梯度提升决策树在表格数据领域占据主导地位。在此,我们展示表格先验数据拟合网络(TabPFN),这是一种表格基础模型,在样本数量多达10000个的数据集上,它以较大优势超越了之前所有方法,且训练时间大幅减少。在分类设置中,TabPFN在2.8秒内的表现优于经过4小时调优的最强基线模型的集成。作为一种基于生成式Transformer的基础模型,该模型还支持微调、数据生成、密度估计以及学习可重复使用的嵌入。TabPFN是一种学习算法,它本身是在数百万个合成数据集上学习得到的,展示了这种方法在算法开发方面的强大力量。通过提高跨不同领域的建模能力,TabPFN有潜力加速科学发现,并增强各个领域的重要决策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/1518ce746782/41586_2024_8328_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/389a70d27529/41586_2024_8328_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/39ce73b04493/41586_2024_8328_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/72df4243f3fa/41586_2024_8328_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/9028fc7d23a8/41586_2024_8328_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/fc874f52425c/41586_2024_8328_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/a7699e2a101f/41586_2024_8328_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/b1c15a34d9e4/41586_2024_8328_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/81ae11ca0102/41586_2024_8328_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/8983441156fb/41586_2024_8328_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/1518ce746782/41586_2024_8328_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/389a70d27529/41586_2024_8328_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/39ce73b04493/41586_2024_8328_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/72df4243f3fa/41586_2024_8328_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/9028fc7d23a8/41586_2024_8328_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/fc874f52425c/41586_2024_8328_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/a7699e2a101f/41586_2024_8328_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/b1c15a34d9e4/41586_2024_8328_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/81ae11ca0102/41586_2024_8328_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/8983441156fb/41586_2024_8328_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fc6/11711098/1518ce746782/41586_2024_8328_Fig10_ESM.jpg

相似文献

1
Accurate predictions on small data with a tabular foundation model.基于表格基础模型对小数据进行准确预测。
Nature. 2025 Jan;637(8045):319-326. doi: 10.1038/s41586-024-08328-6. Epub 2025 Jan 8.
2
Transfer learning for a tabular-to-image approach: A case study for cardiovascular disease prediction.用于表格到图像方法的迁移学习:心血管疾病预测的案例研究。
J Biomed Inform. 2025 May;165:104821. doi: 10.1016/j.jbi.2025.104821. Epub 2025 Apr 8.
3
Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation.基于效用的统计方法和深度学习模型用于合成数据生成的分析,重点关注相关结构:算法开发与验证
JMIR AI. 2025 Mar 20;4:e65729. doi: 10.2196/65729.
4
Tabular deep learning: a comparative study applied to multi-task genome-wide prediction.表格深度学习:应用于多任务全基因组预测的比较研究。
BMC Bioinformatics. 2024 Oct 4;25(1):322. doi: 10.1186/s12859-024-05940-1.
5
Predicting dementia in Parkinson's disease on a small tabular dataset using hybrid LightGBM-TabPFN and SHAP.使用混合LightGBM-TabPFN和SHAP在一个小型表格数据集上预测帕金森病中的痴呆症。
Digit Health. 2024 Aug 16;10:20552076241272585. doi: 10.1177/20552076241272585. eCollection 2024 Jan-Dec.
6
MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation.MMM 和 MMMSynth:异构表格数据的聚类和合成数据生成。
PLoS One. 2024 Apr 17;19(4):e0302271. doi: 10.1371/journal.pone.0302271. eCollection 2024.
7
Tabular transformer generative adversarial network for heterogeneous distribution in healthcare.用于医疗保健中异构分布的表格变压器生成对抗网络。
Sci Rep. 2025 Mar 25;15(1):10254. doi: 10.1038/s41598-025-93077-3.
8
Time Sequence Deep Learning Model for Ubiquitous Tabular Data with Unique 3D Tensors Manipulation.用于具有独特3D张量操作的普适表格数据的时间序列深度学习模型。
Entropy (Basel). 2024 Sep 12;26(9):783. doi: 10.3390/e26090783.
9
Positional embeddings and zero-shot learning using BERT for molecular-property prediction.使用BERT进行位置嵌入和零样本学习以预测分子性质
J Cheminform. 2025 Feb 5;17(1):17. doi: 10.1186/s13321-025-00959-9.
10
Enhanced analysis of tabular data through Multi-representation DeepInsight.通过多表征深度洞察增强表格数据分析。
Sci Rep. 2024 Jun 4;14(1):12851. doi: 10.1038/s41598-024-63630-7.

引用本文的文献

1
Machine Learning Prediction of Cardiovascular Risk in Type 1 Diabetes Mellitus Using Radiomic Features from Multimodal Retinal Images.利用多模态视网膜图像的放射组学特征进行1型糖尿病心血管风险的机器学习预测
Ophthalmol Sci. 2025 Jul 4;5(6):100874. doi: 10.1016/j.xops.2025.100874. eCollection 2025 Nov-Dec.
2
Deep Learning Predicts Postoperative Mobility, Activities of Daily Living, and Discharge Destination in Older Adults from Sensor Data.深度学习通过传感器数据预测老年人术后的活动能力、日常生活活动及出院去向。
Sensors (Basel). 2025 Aug 13;25(16):5021. doi: 10.3390/s25165021.
3
Predicting opioid consumption after surgical discharge: a multinational derivation and validation study using a foundation model.

本文引用的文献

1
Deep Neural Networks and Tabular Data: A Survey.深度神经网络与表格数据:一项综述。
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7499-7519. doi: 10.1109/TNNLS.2022.3229161. Epub 2024 Jun 3.
2
Synthetic data as an enabler for machine learning applications in medicine.合成数据助力医学领域的机器学习应用。
iScience. 2022 Oct 13;25(11):105331. doi: 10.1016/j.isci.2022.105331. eCollection 2022 Nov 18.
3
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
预测术后出院后的阿片类药物消费量:一项使用基础模型的跨国推导与验证研究
NPJ Digit Med. 2025 Aug 26;8(1):547. doi: 10.1038/s41746-025-01798-6.
4
Deep Learning Multi-Modal Melanoma Detection: Algorithm Development and Validation.深度学习多模态黑色素瘤检测:算法开发与验证
JMIR AI. 2025 Aug 13;4:e66561. doi: 10.2196/66561.
5
Hierarchical, Interactive, and Dynamic Predictive Capacity of Current Biological, Psychological, Social, and Environmental Measurements in Depression, Anxiety, ADHD, and Social Quality across the Lifespan.当前生物、心理、社会和环境测量在整个生命周期中对抑郁、焦虑、注意力缺陷多动障碍和社会质量的分层、交互和动态预测能力。
Res Sq. 2025 Jul 30:rs.3.rs-7060126. doi: 10.21203/rs.3.rs-7060126/v1.
6
Enhancing early gestational diabetes mellitus prediction with imputation-based machine learning framework: A comparative study on real-world clinical records.基于插补的机器学习框架增强早期妊娠糖尿病预测:对真实世界临床记录的比较研究
Digit Health. 2025 Jul 29;11:20552076251352436. doi: 10.1177/20552076251352436. eCollection 2025 Jan-Dec.
7
MRI Delta-Radiomics and Morphological Feature-Driven TabPFN Model for Preoperative Prediction of Lymphovascular Invasion in Invasive Breast Cancer.基于MRI影像组学和形态学特征驱动的TabPFN模型用于浸润性乳腺癌术前预测淋巴管浸润
Technol Cancer Res Treat. 2025 Jan-Dec;24:15330338251362050. doi: 10.1177/15330338251362050. Epub 2025 Jul 22.
8
Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data.基于微生物组的龋齿诊断的集成学习:来自唾液和牙菌斑宏基因组数据的多组建模与生物学解释
BMC Oral Health. 2025 Jul 17;25(1):1188. doi: 10.1186/s12903-025-06590-2.
9
Evaluating large language models on hospital health data for automated emergency triage.基于医院健康数据评估大型语言模型以实现自动急诊分诊。
Int J Comput Assist Radiol Surg. 2025 Jul 16. doi: 10.1007/s11548-025-03475-1.
10
RPSLearner: A novel approach based on random projection and deep stacking learning for categorizing NSCLC.RPS学习者:一种基于随机投影和深度堆叠学习的非小细胞肺癌分类新方法。
bioRxiv. 2025 May 7:2025.05.01.651699. doi: 10.1101/2025.05.01.651699.
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
4
Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。
Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.
5
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
6
Organization of growing random networks.生长随机网络的组织
Phys Rev E Stat Nonlin Soft Matter Phys. 2001 Jun;63(6 Pt 2):066123. doi: 10.1103/PhysRevE.63.066123. Epub 2001 May 24.