TabFM:一种用于表格数据的零样本基础模型
TabFM: A zero-shot foundation model for tabular data

原始链接: https://research.google/blog/introducing-tabfm-a-zero-shot-foundation-model-for-tabular-data/

表格数据是企业级机器学习的基础,然而 XGBoost 和随机森林等传统方法仍然非常耗费人力。这些模型需要大量的手动工作,包括繁琐的超参数调整和特定领域的特征工程,才能获得可靠的性能。 为了解决这些瓶颈,我们推出了 **TabFM**,这是一种专为表格分类和回归设计的新型基础模型。受大型语言模型上下文学习(ICL)能力的启发,TabFM 将表格预测重新构建为零样本任务。这种方法消除了手动模型训练和复杂预处理的需要;用户只需通过单次前向传播,即可对未知数据生成高质量的预测。 TabFM 通过消除结构化数据通常带来的技术壁垒,简化了机器学习的生命周期。目前,该模型已通过 Hugging Face 和 GitHub 对公众开放。

Hacker News 最新 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 TabFM:用于表格数据的零样本基础模型 (research.google) brandonb 发布于 53 分钟前 | 13 点 | 隐藏 | 过往 | 收藏 | 2 条评论 | 帮助 hodgehog11 16 分钟前 | 下一条 [–] 一方面,这令人印象深刻。TabPFN 本身已经是业内顶尖,并且正在彻底改变表格数据的贝叶斯预测(几乎涵盖了所有领域)。 另一方面,也许只是我个人的感觉,我认为这种基准测试报告的形式在该领域是不可接受的。TabArena 实际上有多个指标,因为 ELO 分数并不能恰当地量化改进程度。这里没有展示这些指标,值得深思。此外,GitHub 中的结果部分简直是一团糟。 回复 kingjimmy 24 分钟前 | 上一条 [–] 在 SAP 收购 Prior Labs 之后看到谷歌发布这个,很有意思。 回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Tabular data constitutes the backbone of enterprise data infrastructure and powers a significant fraction of critical predictive machine learning applications. From predicting customer churn to identifying financial fraud, tabular regression and classification tasks are ubiquitous. For years, supervised tree-based algorithms like AdaBoost, XGBoost and random forests, to name a few, have historically dominated this space, offering robust performance on structured data.

However, the lifecycle of deploying these traditional models presents a significant bottleneck. Fitting an XGBoost model to a new dataset is not merely a matter of a single .fit() step; it invariably requires tedious manual effort. Data scientists must invest countless hours into extensive hyperparameter optimization and domain-specific feature engineering just to extract a reliable signal from the raw data.

On the other hand, recent advances in the broader machine learning landscape — particularly the evolution of large language models (LLMs) — have changed how we interact with novel tasks. LLMs have demonstrated the remarkable power of zero-shot prediction through in-context learning (ICL). This technique lets a pretrained model learn a new task by providing examples and instructions in the input context, without updating any underlying model weights.

Today, we introduce TabFM, a foundation model designed specifically for tabular data classification and regression. By framing tabular prediction as an ICL problem, TabFM eliminates the need for manual model training, hyperparameter tuning, and complex feature engineering. We are excited to share how this approach allows users to generate high-quality predictions on previously unseen tables in a single forward pass. TabFM is now available on our Hugging Face and GitHub repos.

联系我们 contact @ memedata.com