We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 180: Reinforcement Learning

180: Reinforcement Learning

2025/3/17
logo of podcast Programming Throwdown

Programming Throwdown

Shownotes Transcript

Intro topic: Grills

News/Links:

 

Book of the Show

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h)

Tool of the Show

  • Patrick: 

  • Pokemon Sword and Shield

  • Jason: 

  • Features and Labels ( https://fal.ai) )

Topic: Reinforcement Learning

  • Three types of AI

  • Supervised Learning

  • Unsupervised Learning

  • Reinforcement Learning

  • Online vs Offline RL

  • Optimization algorithms

  • Value optimization

  • SARSA

  • Q-Learning

  • Policy optimization

  • Policy Gradients

  • Actor-Critic

  • Proximal Policy Optimization

  • Value vs Policy Optimization

  • Value optimization is more intuitive (Value loss)

  • Policy optimization is less intuitive at first (policy gradients)

  • Converting values to policies in deep learning is difficult

  • Imitation Learning

  • Supervised policy learning

  • Often used to bootstrap reinforcement learning

  • Policy Evaluation

  • Propensity scoring versus model-based

  • Challenges to training RL model

  • Two optimization loops

  • Collecting feedback vs updating the model

  • Difficult optimization target

  • Policy evaluation

  • RLHF &  GRPO

** ★ Support this podcast on Patreon ★) **