We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Export Podcast Subscriptions

cover of episode 180: Reinforcement Learning

180: Reinforcement Learning

2025/3/17

Programming Throwdown

Shownotes Transcript

Intro topic: Grills

News/Links:

You can’t call yourself a senior until you’ve worked on a legacy project
https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project)
Recraft might be the most powerful AI image platform I’ve ever used — here’s why
https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why)
NASA has a list of 10 rules for software development
https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm)
AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre)

Book of the Show

Patrick:
The Player of Games (Ian M Banks)
https://a.co/d/1ZpUhGl) (non-affiliate)
Jason:
Basic Roleplaying Universal Game Engine
https://amzn.to/3ES4p5i)

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h)

Tool of the Show

Patrick:
Pokemon Sword and Shield
Jason:
Features and Labels ( https://fal.ai) )

Topic: Reinforcement Learning

Three types of AI
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Online vs Offline RL
Optimization algorithms
Value optimization
SARSA
Q-Learning
Policy optimization
Policy Gradients
Actor-Critic
Proximal Policy Optimization
Value vs Policy Optimization
Value optimization is more intuitive (Value loss)
Policy optimization is less intuitive at first (policy gradients)
Converting values to policies in deep learning is difficult
Imitation Learning
Supervised policy learning
Often used to bootstrap reinforcement learning
Policy Evaluation
Propensity scoring versus model-based
Challenges to training RL model
Two optimization loops
Collecting feedback vs updating the model
Difficult optimization target
Policy evaluation
RLHF & GRPO

** ★ Support this podcast on Patreon ★) **