We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 863: TabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter

863: TabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter

2025/2/18
logo of podcast Super Data Science: ML & AI Podcast with Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

AI Deep Dive Transcript
People
F
Frank Hutter
J
Jon Krohn
Topics
Jon Krohn: 深度学习在图像、音频和自然语言处理方面取得了显著进展,但在处理表格数据方面却进展缓慢。TabPFN 的出现为解决这一问题提供了新的途径。 Frank Hutter: 表格数据与其他类型的数据不同,它通常数据量较小且多样化,特征通常已预先定义。深度学习擅长特征提取,但表格数据并不需要这种特征提取。TabPFN 使用类似 GPT 的 Transformer 架构,能够进行上下文学习,将整个训练集和测试集作为输入,直接预测测试集的输出,无需显式地学习特征。TabPFN 使用合成数据进行训练,通过生成一个关于数据集可能外观的先验分布来实现。Prior Data-Fitted Networks (PFNs) 利用贝叶斯推理,通过从先验分布中采样数据并进行监督学习,直接逼近后验预测分布,避免了复杂的贝叶斯推断计算。TabPFN v2 相比 v1,在处理数据类型、缺失值、异常值以及数据规模方面有了显著改进,使其适用范围更广。TabPFN v2 在无需针对时间序列数据进行专门训练的情况下,在时间序列预测任务中取得了最先进的性能。Prior Labs 公司旨在将 TabPFN 技术商业化,并开发更易于大众使用的产品。

Deep Dive

Shownotes Transcript

Jon Krohn talks tabular data with Frank Hutter, Professor of Artificial Intelligence at Universität Freiburg in Germany. Despite the great steps that deep learning has made in analysing images, audio, and natural language, tabular data has remained its insurmountable obstacle. In this episode, Frank Hutter details the path he has found around this obstacle even with limited data by using a ground-breaking transformer architecture. Named TabPFN, this approach is vastly outperforming other architectures, as testified by a write up of TabPFN’s capabilities in Nature. Frank talks about his work on version 2 of TabPFN, the architecture’s cross-industry applicability, and how TabPFN is able to return accurate results with synthetic data.

This episode is brought to you by ODSC), the Open Data Science Conference. Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected]) for sponsorship information.

In this episode you will learn:

  • (05:57) All about the TabPFN architecture 

  • (21:27) Use cases for Bayesian inference

  • (35:07) On getting published in Nature

  • (44:03) How TabPFN handles time series data

  • (51:52) All about Prior Labs

Additional materials: www.superdatascience.com/863)