We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber

881: Beyond GPUs: The Power of Custom AI Accelerators, with Emily Webber

2025/4/22
logo of podcast Super Data Science: ML & AI Podcast with Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

AI Deep Dive AI Chapters Transcript
People
E
Emily Webber
Topics
Emily Webber: 我在计算机科学领域的成功,很大程度上得益于多年冥想和佛教修行培养的专注力和冷静的解决问题的能力。作为AWS的解决方案架构师,我的工作是与客户紧密合作,了解他们的需求,并解释AWS服务如何满足这些需求。SageMaker 是AWS的一个托管机器学习基础设施,它提供了一个完整的开发环境和工具,用于训练、部署和管理机器学习模型。我从SageMaker转向Trainium和Inferentia团队是因为我看到了基础模型的重要性以及高效的硬件基础设施对训练和部署大型模型的关键作用。在AI加速器编程中,内核是指用户自定义的函数,它直接在芯片上执行,而不是由编译器生成,从而可以优化算法的计算效率。AWS Neuron SDK 提供了一套工具,帮助开发者在无需深入了解底层芯片架构的情况下,使用他们选择的框架(如PyTorch或JAX)轻松利用Trainium和Inferentia加速器。Trainium和Inferentia与SageMaker良好集成,客户可以使用SageMaker的开发环境和工具来训练和部署在Trainium和Inferentia芯片上运行的模型。与GPU相比,Trainium和Inferentia等加速器具有更高的性价比和能效,特别适用于训练和部署大型AI模型。Trainium 2 比 Trainium 1 的计算能力提升了四倍,主要是因为每个卡上的神经元核心数量增加到 8 个,并增加了高带宽内存容量。此外,Trainium 2 还引入了 UltraServer 架构,将多个 Trainium 2 实例连接在一起,以实现更高的性能和效率。TP (张量并行度) 指的是用于处理单个张量的核心数量,它会影响模型训练和推理的效率。更改TP度可以优化性能,但也会因集体操作、内存使用和批量大小等因素而降低性能。Anthropic、一些初创公司和Databricks等客户正在使用Trainium和Inferentia芯片来训练和部署各种规模的语言模型,并从中受益。“Build on Trainium” 计划向从事前沿AI研究的学术界提供1.1亿美元的云计算积分。选择合适的AWS实例类型取决于机器学习任务的具体需求,例如训练还是推理,以及模型的大小和复杂度。我热衷于教学,因为我喜欢将复杂的技术概念简化,并与他人分享我的理解,帮助他们更好地学习和使用AWS服务。未来AI领域的技术挑战将集中在大型语言模型的改进、应用集成、高效训练和推理等方面,硬件技术也将持续发展。

Deep Dive

Chapters
Emily Webber's career path transitioned from international finance to AI and machine learning at AWS, driven by a love for data science and a desire to create positive impact. Her background includes a degree in finance, Buddhist studies, and a master's degree combining public policy and computational analysis. This unique background shaped her approach to problem-solving and technology development.
  • Transitioned from international finance to AI/ML
  • Studied Buddhism and meditation
  • Master's in public policy and computational analysis
  • Worked on distributed systems for AWS SageMaker

Shownotes Transcript

Emily Webber speaks to Jon Krohn about her work at Amazon Web Services, from its Annapurna Labs-developed Nitro System, a foundational technology that can enhance securities and performance in the cloud and how Trainium2 became AWS’ most powerful AI chip with four times the compute of Trainium. Hear the specs of AWS’s chips and when to use them.

Additional materials: www.superdatascience.com/881)

This episode is brought to you by ODSC), the Open Data Science Conference).

Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.

In this episode you will learn:

  • (08:36) Emily’s work on AWS’ SageMaker and Trainium 

  • (23:54) How AWS Neuron lets builders tailor their approach to using frameworks 

  • (29:07) Why using an accelerator is better than using a GPU 

  • (35:29) The key differences between AWS Trainium and AWS Trainium2 

  • (52:45) How to select between AWS Trainium and AWS Trainium2