We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

#155 CUDA and GPU Programming with Elliot Arledge

2025/1/10

freeCodeCamp Podcast

AI Deep Dive AI Insights AI Chapters Transcript

People

Elliot Arledge

Topics

Elliot Arledge: CUDA是一种并行编程平台，它利用GPU强大的并行处理能力来加速计算密集型任务，例如矩阵乘法和激活函数，这些是深度学习和大型语言模型的核心运算。学习CUDA需要深入理解GPU架构和并行计算原理，这与传统的CPU编程有很大不同。我的学习方法是通过实践项目，例如在freeCodeCamp上创建CUDA课程，以及构建自己的大型语言模型，来逐步掌握CUDA编程。在这个过程中，我记录下学习过程中的痛点，并将其转化为教学内容，帮助其他学习者更好地理解CUDA。在大型语言模型的训练中，GPU的并行计算能力至关重要。矩阵乘法是大型语言模型的核心运算，而GPU可以高效地并行处理矩阵乘法，从而大幅提升模型训练速度。此外，激活函数的计算也可以通过GPU并行化来加速。我的CUDA课程旨在帮助学习者从零开始掌握CUDA编程，并能够应用于实际项目中。课程内容涵盖了CUDA编程的基础知识、高级技巧以及性能优化方法。通过学习本课程，学习者将能够独立完成基于CUDA的深度学习项目。 Quincy Larson: CUDA和GPU编程是目前AI领域非常热门的技术，尤其是在大型语言模型的训练和应用中。Elliot的CUDA课程在freeCodeCamp平台上非常受欢迎，这说明了对GPU编程人才的需求日益增长。Elliot的学习方法值得借鉴，他强调实践学习的重要性，并通过将学习过程中的经验转化为教学内容，帮助更多人学习CUDA编程。此外，他还分享了高效学习的方法，例如充足的睡眠、健康饮食以及利用AI工具辅助编程。这些方法对于任何学习者来说都是非常有益的。在讨论中，我们还探讨了大学教育与实践学习的关系。Elliot认为，虽然大学教育提供了必要的理论基础，但实践学习和项目开发对于掌握实际技能更为重要。他选择将大学学习作为备选方案，而将更多精力投入到实践学习和项目开发中。

Deep Dive

Key Insights

What is CUDA and why is it important for GPU programming?

CUDA is a parallel computing platform developed by NVIDIA that allows developers to use GPUs for general-purpose processing. GPUs, with thousands of cores, excel at handling simple tasks in parallel, making them ideal for tasks like deep learning, video editing, and fluid simulations. CUDA accelerates these tasks by enabling fast mathematical operations across many cores, which would take significantly longer on a CPU.

Why are GPUs particularly useful for training large language models (LLMs)?

GPUs are essential for training LLMs because the core operations in these models, such as matrix multiplication and activation functions, can be parallelized. Matrix multiplication, for example, involves solving a large puzzle where each piece can be processed independently. GPUs, with their thousands of cores, can handle these operations much faster than CPUs, making them indispensable for training and running LLMs efficiently.

How does Elliot Arledge approach learning and teaching complex topics like CUDA and LLMs?

Elliot approaches learning by diving into 'rabbit holes' of complex topics, taking extensive notes on his learning journey, and identifying pain points. He then uses these insights to teach others effectively. His method involves understanding the difficulty of a topic before mastering it, which allows him to explain concepts in a way that is accessible to beginners. This approach has been particularly effective in his courses on CUDA and building LLMs from scratch.

What are some key applications of CUDA beyond AI and deep learning?

CUDA is used in a wide range of applications beyond AI and deep learning, including cryptocurrency mining, graphics rendering, video editing, and fluid simulations. Its ability to perform fast mathematical operations in parallel makes it a versatile tool for any task that requires high computational throughput.

How does Elliot manage his energy and productivity while working on intensive coding projects?

Elliot emphasizes the importance of sleep, aiming for eight hours a night, as it significantly boosts his productivity. He also maintains a healthy diet and has recently started incorporating exercise into his routine. Additionally, he uses time-lapse videos to document his coding sessions, which helps him stay motivated and focused during long work periods.

What is the significance of NVIDIA's approach to chip design and simulation?

NVIDIA's approach involves simulating chip designs before sending them to foundries for production. This allows them to iterate quickly and reduce the risk of errors. By relying on simulations rather than physical prototypes, NVIDIA can innovate faster and more efficiently, which has contributed to their success in the GPU market.

What are Elliot's thoughts on the future of AI and LLMs?

Elliot believes that while scaling up models like GPT has been effective, future advancements will likely come from architectural innovations and improving data quality. He predicts that researchers will find ways to 'hack' scaling laws, making models more efficient and capable without simply increasing their size. Additionally, he foresees the development of entirely new architectures beyond transformers, which could lead to even more powerful AI systems.

How does Elliot approach reading and understanding academic papers?

Elliot starts by reading the abstract to understand the paper's main idea, then skims through sections like introduction, related work, and results. He focuses on keywords, bold text, and images to grasp the core concepts. For deeper understanding, he uses tools like Google Search, Perplexity, or AI models like Claude to clarify unfamiliar terms. He also emphasizes the importance of implementing algorithms from papers in tools like Jupyter notebooks to solidify his understanding.

What are some key papers Elliot recommends for beginners interested in LLMs?

Elliot recommends three key papers for beginners: 'Attention is All You Need,' which introduces the transformer architecture; 'A Survey of Large Language Models,' which provides a high-level overview of LLMs; and 'QLORA: Efficient Fine-Tuning of Quantized LLMs,' which focuses on efficient fine-tuning techniques. These papers offer a solid foundation for understanding the core concepts and advancements in LLMs.

What is Elliot's perspective on the value of a computer science degree versus self-directed learning?

Elliot believes that while a computer science degree is valuable, especially for beginners, self-directed learning through projects and experimentation can be more effective for those who are serious about mastering the subject. He argues that hands-on experience and tinkering with code can accelerate learning and provide deeper insights than traditional coursework. However, he acknowledges that a degree can still be beneficial for certain job opportunities and structured learning.

Shownotes Transcript

Translations:

中文

Sometimes people will look at something and they'll see the surface area of it. Like they'll see the surface area of this whole cube. And by the time they've like jumped on the surface, they see that it's like, oh, this thing is solid. I can't just bounce on it anymore. And then there's like some depth to it as well. So you see all these other things that are happening and it's like, oh, I'll just go and learn that. And then you realize when you jump on that part of the cube, it's like, oh, there's a whole like –

There's a whole line straight down and a bunch of other things that come with it, right? So that definitely took me by storm. That stuff always gets me, but it was worth it. Welcome back to the Free Code Camp Podcast, your source for raw, unedited interviews with developers.

Today's musical intro with yours truly on the drums, guitar, bass, and keys, 1988 Double Dragon II, Into the Turf. ♪♪♪

♪♪♪

Welcome back to the Free Code Camp podcast. I'm Quincy Larson, teacher and founder of FreeCodeCamp.org. Each week we're talking with developers, founders, and ambitious people in tech. This week we're talking with Elliot Arledge. He's a 20-year-old computer science student who's created several popular Free Code Camp courses on LLMs, the Mojo programming language, and GPU programming with CUDA. He joins us from Edmonton, Alberta, Canada.

Before we talk to Elliot, support for this podcast comes from a grant from Wix Studio. Wix Studio provides developers tools to rapidly build websites with everything out of the box, then extend, replace, and break boundaries with code. Learn more at wixstudio.com. Support also comes from the 11,043 kind folks who support Free Code Camp through a monthly donation. Join these kind folks and help our mission by going to freecodecamp.org slash donate.

Elliot, welcome to the podcast. Howdy. I'm glad to be here. Thanks, Quincy. Yeah. And you are the youngest person I've ever interviewed on the Free Code Camp podcast at the ripe young age of 20. And I'm so happy to be here.

And I just want to thank you again for creating this awesome CUDA course that has been extremely popular. Like a lot of people are getting into GPU programming right now. Can you tell us what CUDA is and maybe talk a little bit about its history and why you felt the need to learn this pretty advanced, pretty difficult to learn tool? Yeah. So, I mean, the whole thing with CUDA is you start off with...

You start off with these CPUs, right? CPUs are really good at doing complex tasks, but they have a very low amount of cores, like maybe six or eight cores.

And then it's like, okay, I can do tasks across, you know, a little amount of cores and do pretty hard ones. But what if I want to do simple tasks across a lot of cores, right? That's where the GPU came from. And so NVIDIA built up on that a lot and they, and they released a piece of software called CUDA. So the CUDA, uh, parallel programming platform, you could say, um,

CUDA is used in a lot of different ways. So you have, I think my list on this is too long. I haven't even memorized it. It's like, you can use it for like cryptocurrency mining, graphics, deep learning, um,

video editing, fluid simulation. There's just so many ways you can use it. It's just that idea of running really, really fast mathematical operations in parallel. That's the whole point. Yeah. And so just to define a couple terms you used, CPU, central processing unit, what people use...

You know, for most of personal computing, basically this central processor really only has one thread. Maybe it's like a multi-core CPU that has like eight cores or something, can handle a bunch of threads, right? But that's nothing compared to handling thousands of threads. That's nothing compared to, for example, like a 4090. Some of the people watching, you might have a 4090 or a 3090. One of those might have anywhere from like 8,000 to 16,000 cores in it, right? Yeah.

Yeah, and these are basically... It's quite a big difference, yeah. These are basically like simpler, not as powerful and sophisticated as the CPUs, but by virtue of having so many of them, you can do things like fluid dynamic simulations where you're simulating the location of a whole bunch of different particles and stuff like that, which a traditional CPU would just have to batch process all those in order, and it just takes too much time, right? Things that you can do with a GPU in an hour might take...

days and days to do with a CPU, right? Yeah. Yeah, exactly. Right. It's like, if I have, if I have say a million jobs to do and I have 10 different, 10 different workers that are really, really fast on a CPU that maybe they can work, I don't know, at like,

speed units 10. And then I have a GPU with, you know, 10,000 workers that can work at speed units of like five or two. It's like the GPU is still like orders of magnitude faster than it at these parallel tasks. So yeah, it's a, it's quite, it's, it's crazy where the technology has come. Yeah. Yeah.

Yeah, and you're right there on the vanguard, like, learning how to use this tool. Obviously, CUDA is not, like, a new tool. I think it's been around for maybe, like, 15 or 20 years, created by NVIDIA to make, like, use of their powerful GPUs. NVIDIA, of course, the 3D graphics card company that we all used as, like, teenagers. If you're, like, you know, a little – if you're a Gen Xer like me, like, you would just go and you –

save, you know, many months of like mowing lawns and stuff. And you'd buy a graphics processor and then you could play quick with like geo graphics instead of, you know, uh, CPU. Yes. Processed like pixelated graphics and stuff. It made everything really smooth. Like fast forward, you know, 25 years or so. Yeah. And this company is like the most valuable company on earth because it turns out for a lot of industrial applications, uh,

you know, you do need a whole lot of processing power and you need that processing power in parallel. Yeah. So they created this language. Yeah. And maybe, I'm sorry, go ahead.

I was going to say, man, NVIDIA, they've just climbed the ladder so high. I looked at the stock yesterday, and I was like, oh my gosh, 183% in the last one year. They're at around $3.2 trillion, I think now, which is insane when you think about that. And it's like a graphics card and parallel processor building company. They don't even build the hardware. They write the blueprints for the hardware to go and pay someone else to print it. They just build the architecture for it and the software to make it run fast.

It's phenomenal what they've done. Yeah. And we could talk about it for a long time because it's an incredibly innovative company in terms of the way they basically simulate everything instead of actually producing a working chip. They just simulate everything and they're like, we think this will work. And then they ship it off the foundry and the foundry spends billions of dollars printing these, hoping that they're...

simulations were accurate and the chip actually works. And yeah, they've had remarkable success with that and it's allowed them to iterate much faster than traditional chip making companies. So let's talk about what this is good for in terms of

building AI systems, which I know is an area that you spend a lot of time with. Of course, uh, you've created a course on rolling your own LLM from scratch, basically, uh, that's on free code camp. I've linked to several courses that you developed, um, in the show notes or on the video description, depending where you're watching this, but let's talk about why GPUs are so important for LLMs specifically. Yeah. So, I mean, the core of this all is really just, uh,

two main operations that happen in these, well, these large language models. You could simplify that to just language models and then language models are based on generally, like right now, they're based on the transformer architecture, right? So you might have heard the term GPT before. That's generatively pre-trained transformer. And these transformers consist of majorly just matrix multiplies, so matrix multiplication and activation functions.

Now, the reason why CUDA accelerates those so much is because matrix multiplication is like solving a giant puzzle. So imagine you're trying to put a puzzle together, right? And you don't need one piece in order to put another, and you can just do all the pieces independently, and they're not dependent on each other, right? So when you're able to take one piece at a time and batch it efficiently across all the different cores on that hardware, you can do a matrix multiplier really, really fast, right?

For those who have done linear algebra, doing the dot product of the rows with the columns, doing thousands, millions, billions of those, I think I ran a benchmark while I was doing the CUDA course and to multiply a 4,000 by 96 by 4,000, a 4,096 by 4,096 by itself. So like a lot of numbers and multiply those. That was done in...

I think like less than a 10th of a second on my, on my GPU down here.

So it's insane. So there's the matrix multiply, which is like the main operation in deep learning, right? And then there's the other one, which is activations, where you essentially just take each number in a matrix or a tensor and you essentially do a function on it. So for example, like sine or cosine or relu, right? It's just simply an operation to that number. You do a single...

Like it's very easy to throw parallel computing at that stuff and make it run fast. So that's mainly why you see this stuff like speed up. Yeah. I mean, if you go and run a language model on a CPU and then go and run it on a GPU over here, it's going to be a massive difference. Like you might not be able to see anything useful come out because it's so slow on the CPU side. So yeah, it's crazy, man.

Yeah, well, just to define some of the terms. So, of course, matrix multiplication, where you have a big grid of numbers and you're multiplying them together. And I love that puzzle analogy, by the way. That's like just because if you think about it, like everybody has put together a puzzle and they know that like, OK, like you can actually get a lot of people together putting a puzzle together and it actually gets faster. Right.

Solving a crossword does not necessarily get faster with multiple people because it might get a little bit faster, but there's diminishing returns. But if you had 100...

People solving like a hundred thousand piece puzzle, you better believe that it's going to be done. I don't know, at least probably like 50 X faster than having a single person do it. Right. Like, like there are probably diminishing returns to parallelism, but not when the puzzle is big enough. Right. And the puzzle is quite big when you're doing, uh, what did you say? You said you were doing a 4,000 by 96, uh, matrix and 4,096 by 4,096. And then that times itself. Okay.

Yeah. So, yeah, it's insane. I mean, typically in deep learning, you don't have like literally 4096, but you don't have them that big. Like it's typically like batch size by the number of tokens in the context window times the channels or the embedding dimension. And that's typically not like, that's not like super, super high. That's like you can, maybe it is, I don't know, but it's not a...

When you're able to have this run on really, really fast hardware like H100s, I think that's what XAI uses and OpenAI and Anthropic and all them. I think Anthropic uses some other stuff like AWS Tranium as well. But yeah, you get the point. When you have certain specific hardware that's designed to do very, very fast high throughput computing tasks, it's a different ballgame for sure.

Yeah. Well, we could talk about GPUs and we could talk about CUDA all day, but I'm really interested in how you are managing to learn all this stuff. Because I think there are certainly, if you want to learn how to build AI systems, check out Elliot's courses. He's got a comprehensive CUDA course. He's also got an AI systems from scratch with Python course. But if you want to learn how to learn...

This man is a paragon of just sitting down and cranking. You've got these time lapses that you do where you just sit down and you code for like 12 hours and it's just like super sped up. It's like two minutes long of you just drinking, getting up to go to the bathroom and just grinding basically. And it's really cool. It's like hyperkinetic. I love it. And it just gets me jazzed and it gets me fired up to sit down and get some work done myself because –

there's nothing like watching somebody do work to inspire you to do work yourself. It's like watching an NBA game and then going to the gym and hitting the answer or going to the court and shooting some hoops. You're kind of curious and getting fired up. And you've shared your methodology through which you create those as well. If anybody wants to create a super fast game,

sped up version of like what they did at their desk or something, or you can do it on your screen. I've seen a lot of people like the guy who ran, there was like this, uh, game that was focused around learning the code. I can't remember the name, but he would always post these like, I did like a 24 hour code session and it would be his screen and like him moving around different elements and like jumping back and forth between the back end and the front end. And,

getting the browser up. Those were super chill. That was like 10 years ago or something. But like, I was like, man, that's so awesome. Like, it's like watching Twitch on like 100x speed or something, you know, watching live coding on Twitch. So, so yes, you have the energy, I'd be interested in actually hearing a little bit about your routine, and how you keep your energy up before we dive into how you learn. Because you do seem to have an abundance of energy.

I do have an, I guess you could say, I mean, I am 20 years old. Maybe that helps. Um, I've been, uh, I've generally been pretty good at like feeding healthy food into my body. My exercise has not been amazing, but, but, uh, like, like right now in the past week, I've really, uh, I've really started to notice a difference, like a massive difference. Uh, you know, I've been like sleeping eight hours that, that does, that that's like a superpower sleeping for eight hours. Um,

You would not believe, right? So you get these like 16 hours in a day. Maybe like one or two or three of them are spent doing things and they're just like organizing. And then you get another like 13, 14 hours in the day that are reserved for just like working or studying or your job or whatever it is. Maybe if you have a family, you're spending time with your family, right? Maybe you have like maybe 10, 8 or 10 hours a day to learn.

Now it's like you get those hours, you spend the rest of them doing what you can. And then those eight hours, you have those reserved, maybe seven and a half of it if it's like a really, really busy day. But you want to keep it, you want to aim for eight hours every single day. And that is a superpower. Like every time I try to push it to like five or six hours, I end up finding that my week is ruined.

Like, I can still get some stuff done, but it's not quite the same. So I would say like sleep is really important. You know, I know like for developers, it's hard to get sleep because you're always just your brain is always in that phase of just keep on going and going and going and it'll be fine. I'll just wake up early the next day. Yeah. Yeah.

I mean, when you're trying to solve a problem, you got to get the sleep, you know? Yeah. When you're trying to solve a problem and you, and you get that monomaniacal focus on solving that problem, like, damn, why isn't this working? You know, like it can be very hard to say, Oh, I got to sleep. Yeah. Yeah.

Yeah, I mean, if you can have that energy that you maybe get for a few hours on like low amount of sleep, you get – if you sleep for eight hours, you get that for the whole day. So it's like pick and choose. If you weren't fast enough that day, it's like, okay, maybe you should get more done the next day, right? Yeah. That's your fault. Don't like trash your body for that. You know what's cool? Eight hours of sleep. You know what's cooler?

Nine hours of sleep. If you try nine hours of sleep, it'll be hard to go back. You feel like you've taken the limitless pill or something. There's some cheat code that's been activated. IDF, whatever the cheat code is in Doom to give you God mode, right? It feels like that because you're just turboing through the day and every interaction feels a lot simpler. I genuinely think that humans used to sleep like this.

10 hours a day, 12 hours. They'd sleep through the entire winter, you know, like basically. Is that real? Yeah. Like humans, like when it was cold outside and stuff, they just stay in, you know, in their caves or whatever structure they, they'd made. And they just sleep a whole lot because it preserves calories. And, um, you know, like,

I think your body like ages less quickly when you're asleep and stuff like that. So it like technically like slightly increases your longevity. There's a lot of science around sleep and I'm not going to pretend to have read very much of it, but I do know that like sleep deprivation builds up, like not just in your brain, but it builds up in like different organs and stuff. And, and like sleeping is like a detoxification. It's like, that's your opportunity, your body's opportunity to go clean up all the, you know,

byproduct that it creates through just like living your day, running around, going to the gym, like dealing with stressors in your life. See kids. See the thing with the thing with nine hours though, once you get eight is usually my good area. Maybe eight and a half depends, depends on how sleep deprived I am, but sometimes eight and a half works. But the second I start going above nine and like 10, I've even slept for like 13 hours at some point. That's a lot. Oh my gosh. I don't have, yeah. I,

I mean, for some of these time lapses, it's like literally 18 hours every single day. And then at the end of the week, you're like so heavily sleep deprived that even 13 hours isn't enough. Yeah. So don't get in the position where you have to make up sleep is my humble advice. Yeah.

No. I mean, if you're sticking to a good routine, that'll generally work. I think after a certain point, you get diminishing returns. So after, say, nine or 10 hours, you'll start to feel like that, quote unquote, limitless pill like you brought up. But after a little while, like after a few hours, I've done this. It feels amazing for the first little bit. You're able to get out. You don't need to stretch or anything. Everything feels so free. And then

All of a sudden, it just hits you like you ate something that's super hard to digest and your body is just working and you don't have energy to think. It's just weird. I feel – I don't know if I'm the only one. Probably not. But I usually get diminishing returns with like more than nine hours of sleep. Well, have you tried napping? Napping? I've done some napping. I think my body doesn't like naps generally. Generally, when I do naps, I find that waking up and getting back into the zone is more difficult.

I don't know. Like if I, if I like roll my chair down and then fall asleep for like two hours and wake back up, I feel like I feel so tired and like not in the zone anymore. Like I have no energy to do stuff. I don't know. Maybe that's just me, but I typically like to maximize what I can with the full day.

Yeah. I mean, for me, napping is like a desperation move. Like you're like really low energy and you're just like, I'm not getting jacked on today. I'm going to take a nap. And like, so it's a last resort for me, but I like, I wake up and I get what I like to call nap taste in my mouth and I feel all groggy and like, it feels like somebody's put a bunch of like acid into my torso or something. I'm just like, ah,

You know, why am I awake? But if you don't keep sleeping, you completely throw off your sleep schedule, you know? But yeah, anyway, we can talk about sleep. Apologies to anybody listening who doesn't care about sleep, but it is important. I think you and I would both agree. Sleeping enough is important. And there are people who are in circumstances where they're just so busy doing things. And it's easy for us to just say, oh, just figure out a way to make

Make time for sleep. Like a lot of people, they have elderly relatives they're taking care of. They've got young kids. They're working multiple jobs to try to pay their outrageous rents. You know, any number of things like that. Like, my heart goes out to you, but if there's a way that you can

figure out how to get your sleep schedule, it'll make it a lot easier to solve a lot of the other problems in your life. It certainly has helped me figure out how to... Even being able... It's kind of like a... What do you call it? It's kind of like a chain reaction effect where I don't know if that's the best word.

A virtuous circle? If you're able to have more energy in the day from maybe one day, you start sleeping an extra hour or two, and then that makes the rest of the day easier to plan. It makes it easier...

Yeah.

So obligatory, neither of us are sleep scientists. These are just our own personal anecdotal observations. But yeah. So let's talk about other things that you do, patterns, habits you've developed over time that have helped you be able to assimilate this massive corpus of programming knowledge and help you sit down and actually do it. Yeah. I mean, there's typically when I'm going to learn something new, it's...

You'll often hear this term rabbit holes, where you go down, you have no idea what you're going into. Maybe you've heard some people mention it. Maybe you saw a quick overview from like Fireship or Free Code Camp on YouTube, and you just want to, you like head down that, and it's like, maybe something interesting will come out of this, right? Those like late night rabbit hole dives. And I find those are typically where I filter out the best ideas. So a lot of, I like to put a lot of thought into

some of the ideas and the products that I build. So especially on free code camp, like I usually have ended up just at like the educational route of just wanting to share some knowledge with people, show someone how to build something because it was an immense pain point on learning how to go and do that. So,

You know, starting off with the rabbit hole, figuring out a bunch of these pain points, and then you write them down, right? This is very important. It's a very important part of how I'm able to teach things. Why people say that I teach things. Well, I'm not sure how good of a teacher I actually am. I don't know. It's kind of like other people observing and telling me, but...

Yeah, when you're able to go through something and have it be difficult to learn and write that stuff down, not just when you understand it, but what you were thinking about before you understood it, then it's much easier to explain things. And it's much easier for people to understand things when it's coming in, right? So when someone has, say, like a PhD in a field and they've gone all the way up here and a lot of this other stuff below has sort of dissipated, it's not gone, but it's dissipated a little bit. And now they're just hovering at the top.

I've talked to a few people like this and I even gone through some courses and it is, uh, yeah, it's, it's, it's, it's a lot harder to learn some things when someone hasn't come like all the way back down. So, you know, just, I mean, this is why, partially why the LLM course I think was so good. Uh, I got a lot of really good feedback on that is because it was such an intensive, like note-taking learning journey, right?

I'm not even kidding. I started off with – and this was probably the best decision of my entire life. I was heading back home on the train from school, from university, and I was looking at YouTube on my phone, and I saw this video from a guy named André Karpathy. Some of you might know who he is. Yeah, he's a legend in AI development, AI systems development.

Yeah, yeah, that guy's a genius. I don't know how he does it. But I saw one of his lectures. It was like a two-hour lecture on how to build a GPT from scratch, right? How to build a transformer language model that you can talk to on your own Windows or Linux or Mac computer, right? Whatever it is. And I was looking at this.

And then I couldn't understand like the first 10 minutes of it. And I was really bothered. I thought it was really interesting because I wanted to understood how chat GPT works. I'm like, I want to understand this. It's so cool. And just going through that and persevering through the first idea of like understanding, you know, how to, how to like tokenize a word and then detokenize it. Right. Like simple stuff like this is extremely trivial now, like after you, after you go through it, but just going through those first steps, um,

I was like, okay, I understand some of the pain points here. What if, and this was a completely psycho idea. I was like, what if I reach out to Free Code Camp and say, hey, I think I can cover this topic and potentially other ones better. Like I hadn't even gone through the whole thing yet. I literally just reached out to you guys and I was like, hey, can you guys post this video of me? Like when I'm done of me creating a language model from scratch.

And then you got back to me and you're like, yeah, send a demo video over. And I sent it over and you're like, yeah, go and do it. And I was like in this state, I was like totally freaking out. I was like, oh my gosh, this is a real thing now. And so I spent the entire summer just speed running, doing the exact same thing of like figuring out tokenizers and then writing down the pain points and then just going forward through the whole thing. And eventually, and it was actually a lot more difficult than I thought.

I thought it was going to take one month. It ended up taking like three or four months. But eventually, I was able to get something that had like reasonable sounding English. It wasn't like conversational, but you could understand what it was doing, right? It was reproducing, I think, the Wizard of Oz text. That's what we trained it on. And yeah, so like reaching out to email through you guys, that was the best decision I've ever made. Well, I just want to – That single point? Yeah. Yeah.

I just want to observe that most people learn software engineering by learning FreeCodeCamp. You learned software engineering, in this case, by teaching FreeCodeCamp. You basically created this educational artifact that you kind of like learn just in time. Oh, I need to learn this topic well so I can teach this course. And I think that's an amazing approach. It's like almost...

Almost like learning by doing to the next level and like learning by teaching. And that – I've learned a lot of topics that way personally because I wrote like our first HTML, CSS, JavaScript curriculum, the interactive one. Like not all of it was written by me, but a lot of it was. And a lot of things I had to like go back and like really learn the fundamentals so I could teach them.

And it's a great motivation to learn when you're like, people are going to be waiting. Like same thing if you're giving like a technical talk or something like that. Damn, I can't disappoint people. I really need to know my stuff when I'm up there. And it provides you with this kind of like incentive, this do or die type. I've really got to learn. So you've lit a proverbial fire under your own fire.

And now you have to dance or you burn. I don't know exactly how that analogy works, but that's super, super cool. And again, I just want to thank you again for reaching out. Like we're thrilled that it worked out well. Of course, you were very good with like audio video and all that stuff. And you were clearly passionate about the topic.

And like we get a lot of submissions from people who are just like, oh, yeah, I'm going to read what GPT tells me and it's going to be, you know, but like you clearly actually had like a working knowledge of the different topics and stuff like that. And so it was very easy to say yes. And we're so glad we did.

Yeah, it was quite a journey. I did the same thing with the CUDA tutorial. I'm not going to lie. I did the exact same thing. Andhra Karpathy posted a GitHub repo called LLM.C where they train a GPT in raw CUDA. They reproduce GPT-2. This is an old language model, like predates chat GPT. It's a very old kind of

One of the, you could say, original breakthroughs. Like when GBD2 was released, everyone was like, ah, and then GBD3, ah, right? Everyone was freaking out. Yeah, GBD2 was a- Yeah, it's like literally a repo where you can-

Yeah. And then GPT-3 obviously was also a low moment, but GPT-3.5. GPT-3 was more publicized. Yeah, that's when the whole world was like, whoa. That's when the world snap changed in like a few months. Yeah, it was crazy. But pretty much the project is you train a whole thing in raw CUDA. So it's just written in like raw C in CUDA and you just train this thing. Yeah.

on like Shakespearean text. So it's, yeah, I was looking at that and I couldn't understand a single thing that was happening. I didn't understand a single line except for the Python stuff that was going on. And I was like, oh man, I'm never going to understand this. There's no hope left. I may as well stick with Python. And then I don't know, some, I don't know what it was. I kind of remember some invasive thought was like, let's go and do this. You know? Yeah.

I don't know. I don't care if you remember what this, what this was like, what the exact day was when I decided that was a thing. But, but yeah, like I was just like, okay, well, LM dot C is clearly a pain point for me as a beginner, someone who understands how to build neural networks, but not in C I can build them in pie torch, but I can't build them in C and Kuda. So I was like, you know, maybe we should go down this rabbit hole that lasted, uh, that lasted about a week, that rabbit hole. And before I decided like,

Yeah, I might just like go and build a course out of this. So yeah, pretty much the exact same idea, except there wasn't an actual course to base it on. It was like, I have this thing to idolize over here. And it's like, let's shoot for something like that.

I didn't actually get anywhere close to it. I didn't actually get anywhere close to it, but I got far enough to understand how to performance optimize, quote unquote, CUDA kernels, the very, very fast functions. The kernel is what runs on the GPU. I learned how to optimize those, like profile and do all this, and then as well train...

actually train a, a small neural network called an MLP to learn to classify handwritten digits and give it like a correct output. So you feed in like a 28 by 28 pixel image and then it outputs somewhere like a probability distribution between like zero and nine. And it's like pick which one, right. And you pick like the highest probability selection and

Uh, and I got to train, I actually built that up. That took me, that actually took me probably like a month and a half or two months to do that part. But yeah, just learning everything and then debugging and making sure everything is data is flowing how I want it to. Yeah. That was quite a learning journey, but you know, that's, that's where you start, right? You can't, uh,

So sometimes people will look at something and they'll see the surface area of it. Like they'll see the surface area of this whole cube. And by the time they've like jumped on the surface, they see that it's like, oh, this thing is solid. I can't just bounce on it anymore. And then there's like some depth to it as well. So you see all these other things that are happening and it's like, oh, I'll just go and learn that. And then you realize when you jump on that part of the cube, it's like, oh, there's a whole like –

There's a whole line straight down and a bunch of other things that come with it. So that definitely took me by storm. That stuff always gets me, but it was worth it. Amazing. Yeah, that's such a great analogy. Just like the notion of, I think a lot of people bite off more than they can chew. Their eyes are bigger than their stomach. Oh, how hard could that be? And then they realize, oh, damn, there are a lot of dependencies here. There's a lot of stuff. And that's where a lot of people stop.

A lot of people jump on the cube and they're just like, damn, my legs hurt. What the heck? You know, like this isn't squishy at all. Like here in Dallas, Texas, where I live, I live in Plano. We had this like the world's largest bouncy castle and the entire thing was massive and like,

You could climb on top of this giant hill, basically. There was a giant bouncy castle hill. And you could stand there and you could see for miles. And the whole thing was surrounded by bouncy castle stuff. And I took my kids there and it was just a phenomenal invigorating experience. I think it's called Funbox. It's seasonal. It's not here all the time. But if you're in Dallas or maybe they bring it to other cities, definitely go. It's worth the $20 or whatever to jump around on the world's largest bouncy castle and just...

Have an amazing time. But, but like, you know, when you describe that, that cute, like,

there were lots of points where I was like, Oh, this looks really easy. It won't be hard at all to get up there. And what ends up is like, I'm like kind of like crawling very meticulously to try to get up there. And so that kind of reminds me of like, if I want to rock something deeper, right? Yes, you can absolutely implement neural networks that can recognize digits in pie torch or TensorFlow, right? Like that is not rocket science per se. That is difficult, but,

But it's a whole other level of difficulty when you're trying to do that with C, right? And when you're trying to do that with CUDA. I think that's a pretty standard kind of like first programming assignment. I think several curricula, several MOOCs, Massive Open Online Courses, they'll use that as one of their early assignments because it is a really good way to just quickly get a feel for training something and actually having something that works. A predictive engine, I guess, would be what it's called, or predicts or predicts.

it figures out what digit that's likely to do to be. And it does that probabilistically, right? So no matter how terribly, like my daughter actually, she, she like silly anecdote, but my daughter, uh,

She thought it was fun to draw a line through her sevens or something like that. Because a normal handwritten digit is like a seven. And it turns out she had a perfect score on this piano exam. But one of her questions, she thought she'd draw a fancy seven. And it looked like a one. And...

We all looked at it. We're like, that's a one. Right. And she was so upset that because she tried to make her seven fancy with the fancy digit, like it just wasn't recognizable. Anyway, that is like a silly story, but that was like devastating. It was like an entire afternoon of gloom over her trying to make her numbers too fancy and them not being intelligible by humans. Um,

So even with humans, there's a little bit of fudge, right? Like if you've ever tried to read the handwriting of your physician when they're like prescribing you something famously, doctors famously have terrible handwriting. And so like maybe like an AI system that can interpret that would be very useful. And I'm sure there are plenty now. Yeah. You just get a quick like screenshot of it and then yeah. Send, send that over. What does this say?

Yeah, we should have taken a photo of it and just put it as GPT-4 and be like, hey, what number is this? And that would have just crushed your dreams even more. But so...

I love your approach of just grabbing something and like drilling in. And of course you hit that wall, you hit that hard surface that you're jumping on and you realize it's not giving an inch. And this is going to be like, there are many drills. There are many walls. There's not just one. So do you have like some sort of like,

approach to when you hit a wall like like obviously if you think about like a literal wall you could try to climb over it you could try to walk around it and maybe the wall ends at some point you could just go around it right you could maybe dig under it you could try to like you know juggernaut right through it but like how how do you approach it when you hit like a wall and you suddenly realize damn I'm out of my depth

Um, usually I'll get really sad and go on YouTube or Discord or Twitter or something and not look at the problem for like an hour. That's usually like, that's usually a sign where it's like, I'm done with this. Well, you can, you can go over there. I'm not going to look at you. And then it comes, I don't know, it might be like an hour or maybe like two days or something. And I'll come back and I'll be like, wait, I think I might know how to do this. And then I'll like tinker around and do some more stuff. And then eventually after enough cycles, it typically works. I mean,

A lot of people see me as disciplined. I'm not that disciplined. If there's something hard that comes up, I'm not just immediately going to solve that the day of. There's usually a lot of procrastination. Nobody's perfect. But just to, I guess, get to the point, in terms of how I am able to address things,

Address meaning, I mean, I guess it's different in terms of if I'm trying to solve a neural network problem in PyTorch versus trying to debug some C or CUDA code, it's different, right? Typically, you'll get different types of issues when you're working with Python versus when you're working with C. So with Python, I mean, I use a text editor called Cursor, right? I mean, you know what this is. Cursor is amazing. Cursor is like AI-empowered.

Yeah, like you have it. It's a fork of VS Code. So you have your files on the side here, and then you have your code. You have your code at the top, and then you have your terminal, which you can like flip up and down. And then you've got the chat box on the side, and you can select which model you want. So there's like OpenAI 01. There's like Cloud 3.5 Sonnet New. There's like all the different ones you can use.

I typically use the Claude 3.5 Sonnet ones. They're just like the best at coding. It's not even close. Sometimes O1 is good, but I generally have had good experience with the Claude. Yeah. So I'll typically like look at a file. I'll try to infer like with my own brain without using the AI to think for me. I'll usually look at it and be like, hmm, what's wrong here? There's like an underline under this. Is that a function that we can use? Does that exist in the documentation? Or did you make that up?

Did you make that up when you were generating it? Right. And then I'll like, I'll like maybe look at the docs or I'll be like, Hey, did you make this up? And it'll be like, Oh yeah, I actually did make that up. And then it goes and changes it. And it's like, Oh, it works now. Yeah.

Yeah. Yeah. Like this happens all the time. Like every single day I'll be writing, like right now I've been writing a lot of Java for like Minecraft mods and it'll, yeah, it'll hallucinate some function in the, in the mod, in the, in like Minecraft forge or fabric, whatever I'm using. And I'll be like, um, are you sure that exists? It'll be like, ah, got me. Yeah. It'll go and correct itself. Um, but yeah, I think AI augmented coding is like really good being able to like go through things really, really fast. Um,

Being able to like go and write some boilerplate, the boilerplate stuff is hard. I mean, I don't just want to sit on a keyboard and type for an hour straight. It's like, okay, I have this idea in mind. Go and make a template of this and I'll upgrade it later. Just get the base functionality working such that I can add things onto it, right? That's the part that's like intellectually hard for people.

It's not like necessarily boilerplate. That stuff is just like annoying grunt work, which you don't actually have to do that much of anymore. It's more so like what is the actual engineering problem I'm trying to solve? Why is this problem so hard? Where can I find a solution to it? That stuff, most of that should be done in your brain, right? Yeah.

Right. You can use language models to help you if you're trying to like maybe if you're a little bit naive to the topic, you can use them to help you. But generally speaking, you want to use your brain for those tasks. You don't want to just have the language model go off and spin it. It'll burn your API credit so quickly. So.

Yeah, just pop it into Cursor and just... Whether that be in a Jupyter notebook or just in a C file or a CUDA file, just try things out. Boilerplate and then add this. If I want to test a TensorFlow operation, I'll be like, okay, where do these exist? I'll look through the NVIDIA docs and I'll be like, oh, found it. Can you go insert this, please? And then it's like, oh, does it work or not? And then I can sort of look at the... Yeah, just break it down depth-wise, right? Typically...

This is generally how I approach building the whole CUDA thing. Like, all the CUDA code in the course, as well as the side project of, like, identifying digits, is I'll, like, use Cursor and speed through something super quickly to make sure that it works. Like, worry about robustness later, right? Just try to, like, get an MVP working. Yeah.

And then, uh, and then if, if it, if it works at some point, which is what I'm usually able to get working in like maybe a day or two. And then at that point, it's like, okay, now let's go and break this down. And if I don't understand something, I'll be like, Hey, what's this? And then you just kind of, you know, you have all these tools at your disposal, right? It's just a matter of how you use them. You can, you know, you have like really, really powerful language models. You have all these documentation, you have all these docs you can sort through. Um,

And then of course you can use your own reasoning up here. Don't ever forget that thing. Yeah. It's a, yeah. Yeah. I mean like, uh, the human brain is still way better at doing things than like entire organizational structures are entire, you know, sophisticated distributed software systems. Like the human brain is just don't sleep on it. Don't sleep on your own brain and your own ability to like come up with novel solutions. Yeah.

Yeah. That's awesome. And I'm glad you're getting so much knowledge out of it. I'm a little scared for the...

I'm a little scared for the next generations on how powerful these things are going to be. And my children might just be able to pull up another fork of VS Code or something on their Mac M8 Max or something when that comes out. And they'll just be flying through. And I'll be like, did you write that? They'll be like, no, Claude 7 did it for me. Yeah. Right? Or GPT-8, whatever it is.

There's no substitute for actually understanding what's happening under the hood. And by going down to the C layer and the CUDA layer, you are kind of going several layers deeper than most developers frankly do. And I think that's totally admirable. I want to talk to you about this. You've said that college is a backup plan. Like, do you technically need to finish your university degree?

Maybe not. You may be able to just go out and get a job. I mean, the market is very hot for people who understand how to do CUDA development, right? And yet, there you are. I think you're a sophomore. Is that correct? University? Yeah, I'm a second-year computer science major. Yeah. And so can you contrast the kinds of things you're learning through your CS degree? And I will just preface this with saying, like,

I'm a huge advocate of going and getting a computer science degree, unless you're mid-career or later in life. It doesn't necessarily make sense to go back to university. But if you have the means to study a subject at university, you should study computer science. I strongly believe that it's the most remunerative, most flexible Swiss Army knife type degree that you can get if you don't know what the future is going to hold.

So I'm sorry. I'm, like, biasing your answer probably, Elliot, and, like, putting all my, like, beliefs on the table. But I genuinely tell people, like, you'll hear people trash CS degree. Oh, they're teaching all this old stuff from, like, 30 years ago, 40 years ago. Like, everything that is new is built on the old stuff, right? Like, SQL, Python, you know, C, Linux. Like, these are older technologies, mature technologies that everything is built on.

Right. So, so it does not hurt to go back and learn them and they haven't changed that much. Right. Uh, over the, you know, like a lot of the concepts in play. I mean, I, I generally think that with, with the CS degrees, it's, it's less of teaching the old stuff. I mean, of course you need to learn, see, you need to learn Python three, you need to learn, you know, SQL, whatever it is. Um,

I don't think it's a matter of learning the stuff. I just think it's a matter of how they teach it, the methods for how they go about teaching, like the labs, for example, they'll give you like a take home assignment. Maybe that's not very fun to do. Right. I mean, I mean, maybe you're, maybe that's just like preparing you, doing your chores so that if you don't have fun tasks in the future, you're preparing for that. But,

But I mean, based on, based on the two, I guess I've had like technically one and a little bit of years. Um, cause the next one, the new one just started the second year based on my experience. Um,

I mean, for beginners, for people who have never learned to code before, I think it's okay. But if you're really serious about it, I think you can do it faster. And by understanding that doing projects and just experimenting and tinkering with stuff literally all day is like a faster way to go, I think you can just completely blow the CS degrees out of the water. Yeah. So I guess then my question would be like,

Are you planning to finish your degree? And I don't know if your parents are going to listen to this and hopefully they're not going to be like, what? I have to be careful with what I say here. My dad's going to watch this and he's going to be like, hey. No, but let's say hypothetically Microsoft or some other big tech company or just a more specialized company like a game dev company or something reached out to you and they were like, hey.

Would you like to do an internship? And then after that internship, we're like, hey, would you like to stick around and just work here and not finish your school? Because we'd rather... Employers will always try to... If they think you're ready, they're not going to want you to be dividing your focus between school and your work. Because... No, it's either full-time or no time. Yeah. I mean, they might... There's night school and some...

You know, schools will even pay for you to go back and get your master's and stuff like that. But like, I think right now with the rate at which things are moving in AI and the amounts of money involved, frankly, like companies are going to probably demand that you've, you have like a sole focus or they're going to, they may not demand it, but they're going to definitely,

Lay it on. And every time they talk with you, you're still doing that school thing. So let's say hypothetically you get an internship. I don't know if you've already lined up internships. You're pretty early in school. But let's say hypothetically you do an internship and they do want to convert you into a full-time employee. How tempted would you be to leave school for industry? Absolutely. 100%. 100% tempted successfully? I mean –

Like just talking right now, I have a few opportunities, but nothing is like guaranteed. It's not like guaranteed. I mean, this, I don't know. I have to be careful with this because like right now we released this and like whatever it is, like a few days a week, whatever it is. And that might change in the next like week or the next month or the next year. And it might change a lot. Yeah. So-

Well, I've had friends whose offers have literally been rescinded after they left their job and were doing this stuff. And they got severance and all that stuff during the big layoffs and stuff like that. So nothing is guaranteed. Definitely take that as a granted. But assuming that at least one of those opportunities came to fruition, and if you've got a basket, it –

Diversification reduces risk, right? Like you've got a basket of potential opportunities. Let's take for granted that like at least one of those is likely to work out.

Yeah. Well, if one of them is likely to work out and actually does like, you know, do an interview, whatever, get the offer. I'm not going to, I got, I'm not going to school for the rest of the year. Simple as that. Okay. Like literally anything to, to get out. I mean, I don't learn that much for my degree anyways, like right now, I kid you not. I'm, I'm about halfway through my data structures and algorithms course in Python. Okay. So it's a, it's not, not too crazy yet.

I mean, if I left now, I'd probably miss out on some of the more fun stuff later. But it's totally worth it to just go out and work on cool AI stuff in a lab or something and work with really, really cool people there. Yeah. I would instantly take that. Yeah. Well, I'm glad to hear that you're candid with yourself and you've thought about this stuff. Because you're probably going to face a lot of pressure, not just familial pressure, but from your classmates and everything, maybe your profs.

Does your school have the ability to go back? Like Stanford is famous for like, you can leave and come back all the time. And people do people like go and found like some startup that doesn't work out and they come back and they finish their degree because the degree from Stanford is worth quite a bit on the, on the job market. And you, and you learn a lot along the way. And so they, they have kind of like this porous kind of like a membrane that you can pass through either way. Right. Um,

What is it like up there where you are? And my knowledge, of course, I used to live in San Francisco and the Bay Area. And I know a lot about the tech ecosystem over there, the proverbial mecca of software development. But I know a lot less about Canadian market, Edmonton, for example. What is it like there, the employment market like there? Are employers friendly with people that...

dropped out of college like they might be in Palo Alto, for example? So, I mean, here we have the University of Alberta. We have McEwen University, which is what I go to. And we have NAIT, which is what I used to go to as well. As for...

As for the job market, I'm actually not too sure. I haven't jumped. I haven't explicitly looked for opportunities in my city, mainly because the big stuff is over in, you know, like California or Austin or even, even like Europe for that matter. Who knows? So right now I've, I've talked to like my, my sample size is not very big, so I don't want to talk. I don't want this to be taken like to, I don't want this to be taken heavily, but I I've got, I've talked to,

I think like one or two people in my city that are actually like actively recruiting software engineers. And every time it's been, do you have a degree? Yes or no?

You don't have a degree. We would love to work with you, but I'm sorry we can't. So, I mean, I've shown them some of the stuff I've worked on. They're like, sorry, we can't. It's like, okay, you're a loss. Yeah, and like, to be fair, like, companies are risk averse, and that's one of the things they do to mitigate risk is like, well, let's just hire people who've already finished a degree. And this is one of the reasons. Not everyone. Yeah. Yeah. Go ahead. No.

Not everyone is – like every person that drops out of a CS degree isn't going to be like me or the next guy, right? So the risk-adversity is really important. They have to be careful with that. But I also think it needs to be – like the process where they actually look at the person and evaluate them, that might need to be improved, right? If you just look at any person at all and just kind of be like, oh, no, you don't have a degree. We can't take you. I think there needs to be more there.

Yeah. And that seemed to be the approach that a lot of companies in Silicon Valley are taking. I think the statistic I've heard is like you're 20 times more likely to find an engineer if you're just walking around like, you know, Google's campus or Facebook's campus or something like grab some random engineer. They're 20 times more likely to have a degree than to not have one. But.

these, uh, fan companies and other big employers are open to bringing on people. And so that does mean like the 21st person, uh, would potentially have, you know, no degree. So, and I do know lots of people who work, uh,

Like a lot of them tend to be older who don't have a degree because they just like work for a long time. Like maybe they were doing it during the original Silicon Valley bubble and they just had opportunities. They never had to go back to school and stuff like that. But but it is a viable path. So what you're saying kind of jives with, I think, what the typical job seekers experience is going to be like, even if they are exceptional, they may just be disappointed.

dismissed out of hand because they don't meet the criteria of checking that box for having a degree, which is one of the reasons I tell people to stick it out and get their degree if you can, unless you've got like some like really dramatic opportunity or you happen to be going to like a school like Stanford where it's frictionless for you to just reenter as a student, as an undergrad, uh, if things don't work out for some reason. Yeah. So, uh, yeah.

one of the things I'm really excited to learn about from you is just where you think things are headed because you have the benefit of having built a lot of these systems yourself. You, you have the benefit of having observed the improvements, uh, in LLMs, which is a very specific aspect of L, uh, of AI, but it is arguably the, like the, the fastest moving and most promising area right now. Um, I, I'd certainly love to hear if you think there are other areas that are promising, but, um,

Where do you think this is going? And what are your thoughts as somebody who works closely with these systems as kind of a software engineer and who actually also reads a lot of research? Yeah. I mean, in terms of the way things are going, it's, if you look back, if you look back to,

to the original GPT-1, I think it was, led by OpenAI, all the way up to GPT-3. You look at some of the core stuff in there, and a lot of it was massive. Most of it was just scaling up the model. That's what most of this is. It's just like, buy more GPUs, run these neural networks with larger hyperparameters on them, like a larger batch size, longer context window,

bigger, like embedding dimensions in the tokens. Like you're just scaling things up, adding more layers. Right. But the part that really interests me is figuring out how to sort of break those scaling laws, not break them,

But find different architectures or find things that you can add on to the existing architecture or swap new things in with that will make the scaling better. Right. So right now, if you have, I don't know, some some like base loss curve that represents the scaling of of of language models, like essentially the scaling goes down.

You can, you can predict with very high precision what the loss will be given any amount, uh, given, given model, given the model size and the amount of training get if you feed into it. Right. So these are, this is essentially the, uh, the, the scaling laws that open AI and Google have, have talked about and released. And, uh,

it's, it's mainly just scaling up. It's, it's, it's basically scaling up the model. Right. But if you add new architectural hacks in, like for example, um, for those who know the attention mechanism, if you, if you're able to like find a better attention mechanism that does the same job, but it's like maybe more efficient, then you can put more layers because you can fit more attention mechanisms in the same model size. Right. Or you can get like more, more learning and quote unquote intelligence done per token. That's predicted. Um,

Then ideally, that's something you want to push into your production models, right? So things like GPT-4, I mean, it probably has some... I mean, it already has all these in it, right? Maybe GPT-4 doesn't. Maybe GPT-5 will. Who knows? But mainly I look for how do you... So there's a few things, and I vaguely remember this, but it's

uh, it's improving the quality of the data. So this is number one, when you are, when you're training a model, you want the data to be as high quality as possible. If it encourages active thought, if it encourages the model to behave in a certain way that it produces better outputs, that's a really good thing. You want your data to be super high quality. And that's a difficult problem because it's hard to generate a lot of very high quality data. This is very expensive, right? This is why they pay contractors like millions of dollars to build this stuff. Um,

And then there's architectural hacks. So I have a ton of research papers on how can we change the transformer very slightly and just make it, maybe it will reduce noise or something, or it'll capture a different type of pattern, or this little hack is more powerful, whatever it is. I like to look for stuff like that because it sort of

hacks around the scaling law a little bit. It makes the scaling better. So you can, you can get the same amount done with a smaller model, or if you keep the same model size, you can get an improvement in performance, right. Or intelligence, whatever you want to, I guess evals is the proper term. It wouldn't be, it wouldn't be the intelligence. Cause that's hard to like really quantify. It's more so the eval performance. But that's, that's a discussion for another day.

Then there's, I don't know, there's like tokenizer. There's like so many different things that I look for. It's not just the neural network. It's not just the data. You have to consider everything. Right. And when you find a bunch of things that hold up and you, and you aggregate them into the same model and that holds up, that is like, that is what these like new model releases look like. Right. Right.

When you take like a hundred different mini breakthroughs and you aggregate them and they all hold up in this grand, like when they're all together, that's like, that's considered like maybe a breakthrough or, or I don't know, whatever term you want to place on that. Yeah. Yeah. I typically like a lot of small incremental boosts, but we're depending on where they sit in like kind of like the flow can have outsized impacts on the rest of the flow.

You might find something very early, like an optimization in your tokenizer that just happens to have this dramatic leveraging effect for the rest of the system. Yeah. There might be things that just snap change it like this, but again, it's still going to be aggregated across all these many little or even rather large changes that make the models improve. I mean, it's not just... If you add one thing, it might just be a snap change. It might not be. If you add...

generally how it goes is you have like a bunch of mini things and you don't see like the snap change of like the model improvement. You see like, that's what the users see when the users see a new model release. What they see is we went from here. We went from here to here, but what the researchers see is each little point that improves. It's like, if you, if you discover something, if you improve 1% every day,

Then at the end of the year, you'll be like, I don't know, whatever time. It'll be like an insane amount, right? So it's like the researchers get to see these little 1% each day or each week or whatever. And then when you aggregate all this, it's like at the top, it's like, oh my gosh, that's insane. I'm scared of this thing now. It's so smart. Yeah, just dramatic compounded returns. I agree.

And I love what you said about like the model release seems like a huge step change, but that's because there are tons of tiny little incremental changes, just like engineers spending time thinking about these different things and trying different little experiments. And then those all aggregate over time. And then the model release, like maybe we'll get GPT-5 in like a year or so, but it's going to be –

you know, tons of tiny little innovations at various points during the, the flow. And I say flow, cause I'm, I'm not sure exactly what it's called the pipeline. I don't know. I don't know what like a machine learning engineer would call the entire process that ultimately, you know, produces this model that can do inference. Right. Um, but,

Sure. But you get those benefits. And so to a consumer who's like, whoa, this is so much better than it used to be, it can seem like people are extrapolating this out like Skynet's going to happen next year because look how much better it got this year. But-

There are, in some respects, diminishing returns as well. We've been talking about diminishing returns to sleep. We've been talking about diminishing returns to spending time thinking about a problem. All those things. The law of diminishing returns really is one of the most fundamental, important phenomena in nature. And it applies to machine learning research as well. Yeah.

What do you think is likely to happen? It's just a wild ass guess. Just fire at the barn. Right. But like, what do you think is likely to happen over the next five years in terms of model growth? Do you think, you know, reading all these different papers, do you think that we're going to continue to get these dramatic step changes or do you think things will slow down or what is likely to happen based on your current understanding of the science and the engineering?

So it's super funny. Whenever, whenever someone like puts out a bold, a bold statement about what they think is going to happen, it sometimes someone might like take this and, and like put it on Twitter or X or something. And they might, they might be like, what is this outrageous prediction from Elliot? What the heck? But, but I guess just, just to like, you know, kind of,

If we're ignoring that, I think there's a lot of variables, right? Please don't bother doing that. If you're listening to this, please don't call him out like, what is this madness? Please put it in the context. Tell him to listen to this podcast so they can actually have the context of the conversation instead of some cherry-picked excerpt that makes Elliot look like he's trying to be some futurist or something like that. Yeah. I mean, I guess there's...

There are a lot of variables, right? It's not just one grand thing that I think is going to happen. I think right now the research progress is pretty good. But if I sort of keep those intact, I can say maybe like a Claude 4 or a Claude 5 at some point in the next – think of the current anthropic Claude 3.5 sonnet new version. Right.

but maybe way better at coding tasks. Maybe it gets 95% of the questions correct and it makes you go super speed. I would think of something, I guess naively, it's okay to think of that. But in my view, I don't think it's just going to be transformer scaling forever. It takes a lot of compute to get a really smart model. I don't think you'll just be able to scale GPUs forever. I think there's going to be much more efficient ways to get similar tasks done.

So instead of just taking the classical like GPT for whatever next token prediction and just trying to put as much intelligence as you can into this text as it's sampling like next words, I think...

I think that's easy to work with, but it might not be the best way, right? So when you take things like, for example, the cursor text editor, when you have these super accelerated researchers who are able to make rapid progress because the AIs are literally helping them write code faster, those AIs improve when they write better ones. And it's like a recursive improvement, you could say. And then not just applying that improvement, that like

that exponential improvement to the transformer itself, but finding some other architectures, like, different... Like, not the transformer. Like, throw that away. Or don't, like, discard it entirely, but maybe take some ideas from it and then try something else, right? I think there could be a lot of exploration and potentially very good opportunities in there. Again, that's, like...

It's like completely dark. You cannot see anything. You don't know where you're going. When you start navigating that territory, there's not that much literature on it. But when you start navigating into that, there's a lot that could be explored. You don't know what the potential is.

Right. So I think when you're able to like test things really quickly with, with like fast text editors, like cursor, um, you're able to, you know, find these other like non transformer things more quickly. And I think if we get something like that, which I think is likely to happen, we might see something completely alien, like even more alien than the transformers right now, you know?

Yeah. And it does feel almost like you're talking to an alien when you're talking to GPT four over cloud three point five. When, when, when people see these, uh, when people see these AI movies and they see like the robots taking over or whatever, like that's not, that's not transformers right now. The transformers, they, they take some input and they, they have all these tokens that are attending each other. And there's like lots of features being, uh, being dissected. And eventually it just spits out like, what's the most likely, like, but what, what's the most likely to be the best answer? Like the next word I want to predict, right? It's,

It might not be like that. That's not what these robots are like, right? These robots are – they're like conscious entities. They have these recurrent loops in their brain that are like going back and forth. Recurrent neural networks like human brains, not like LLMs currently. Yeah.

Yeah.

So I feel like something like this where you have these recurrent like back and forth, maybe it'll be something other than recurrent. I don't know.

But I think there's some potential there. Well, it's exciting to hear that you think that the tools we're building now, these LLMs, may help us discover other branches, other paths toward getting even more powerful systems going. So these may serve to accelerate us arriving at more powerful systems. That's really cool. I have one last question for you. I want to be mindful of your time.

You are somebody who does read these papers. Like, I don't know many 20-year-olds who hang out and read, like, you know, academic papers about different, you know, pioneering technologies. Because everything is usually written about before it's implemented. So you can kind of see where things are going by reading the papers. How do you approach reading academic papers? Because isn't it, like, super-duper intimidating? Yeah.

Yes, it is absolutely super intimidating. I remember some of my first ever papers that I was trying to understand and just parse a little bit. It was definitely really scary when you go through the thing and you're just scrolling it, not just trying to brute force read top to bottom, but you just look at the thing. It's like, okay, what am I dealing with here? And you see all these formulas and stuff and all these weird math expressions, and you're like,

uh, what the heck? No. And then you click off. It's like, no thanks. Um, but, but I guess just kind of going through those, um,

I don't – like right now, I don't currently break papers down to the depth that I would want majorly because of time. I don't have time to just go through AI papers all day. But I do what I can with them, right? Like if I'm going home on the train or the bus from school, I'll like whip out my binder and go through like a paper or two and just skim over it, get the idea, try to like ask questions about it. Typically –

Typically when I'm going through a paper, I'll look at the abstract first. That's the first part. You just want to understand, okay, what the heck are they doing? What's the general idea here? Even if it doesn't click right away, you still want to understand, okay, what are we doing here? What's the point of this? Let's associate the title with the abstract. What are they getting done? And then you go and look through the sections. You just look at the bold text, like introduction, related work, introduction.

Like what, uh, experiments results, like all this, right? You just look at all these sections and you look for, you look for keywords, you look for bold words, maybe images, and you try to understand, you get some more context, right? Um,

It also depends on what you're looking for. If you're just looking at something, try to just naively learn some idea from it, you could go down that route. But a lot of people are looking at a paper and they're trying to understand how much does this actually benefit me? If I were to go and learn this, how much would it improve my research and engineering, whatever? So...

Sometimes people will literally just skip. They'll read the abstract and they'll go straight to the results. Like how good actually is this, right? That's kind of how I do it, frankly. If that's not you. Yeah, yeah, right? Is this actually good? Which is, I think that's a safe option to look at is this actually good or not? And you see whatever benchmarks it's evaluated on and then go from there. But if you're looking at the results and you're also trying to

You're also trying to learn and potentially build the thing. I think it's important to...

understand which sections you're reading. Look for the key words when you're reading something. Don't just assume that everything is going to make sense in your brain because it's not. You're going to see a lot of words, probably 20% or 30% of the words that you see, 20% of the important words, you might have no remote clue what they're talking about. And so you'll circle that on your piece of paper or highlight it on your doc and

You'll ask that question and you can dig deeper into it. Like what does this mean, right? You pull up like maybe a language model or you pull up Google search or even perplexity for that matter. I know perplexity is like a great tool that I use. Yeah, it's an AI-enabled search engine.

Yeah. Yeah. Like I literally do command shift P on my keyboard. It pops up a little search tab in the middle, similar to like the Mac spotlight. And I just type in something. It like searches through all the links on the internet and it gives me the best ones. Right. Yeah. It's really nice. But I'll typically look through like, I'll kind of look through like, I'm like,

I'll typically look through it. When I'm looking at a certain paragraph, I'll be selective of which paragraphs I read in which order, but I'll typically go through it and I'll read top to bottom because that's, I mean, it's, I don't know. I think it's like a bad idea to not read top to bottom. You kind of just want to get like, what are they getting across here? You circle the words that don't make sense. I mean, it's structured that way for a reason, right? Like papers are structured that way because that's like how the authors and the editors of the academic paper, you know, organization, like they want you to read through that journal article. Yeah.

In that linear way, because that's why they structure it that way, right? So sorry to jump out and state the obvious, but I commend you for actually kind of using it as intended. Yes. Yeah, it's important. And apologies if I missed some of the details on it. I'll typically go through...

Top to bottom, very select paragraphs. You kind of pick which paragraphs you're looking at. You don't just go through reading it like a storybook. That's always a bad idea. You learn that very early on. Reading it like a storybook is a bad idea. Or at least I found that to be not very useful. But yeah, you can kind of separate it into what have other people done?

What have they done in here? So like usually the core of it where I take the most notes is when they have some algorithm outlined and they'll have some like text description on what exactly they're doing here and all these different variables that they're using. And it's like just putting those together and projecting it onto maybe text or a drawing or whatever it is in such a way that you can make sense of it. I think that's really important. You know, being able to

take something and like, essentially it's, it's the time that you, that you see the section on how the algorithm works after reading like most of everything else. When you, when you see like the core algorithm itself or the proposed algorithms, how quickly can you implement that in a Jupiter notebook in, in a pie torch or something? Right. How quickly can you go from a to B? Uh, I find that to be like one of the core problems there. I know, um,

People like André Karpathy or I don't know if Ilyas at Skever does it. I know like Alec Radford. I've heard lots about him from OpenAI. These people will just – they'll just look at something or have some idea and how – it's like how quickly can I put this into a Jupyter notebook, right? Just get the cells working. Make sure they don't give errors. Done. And then you test and you run experiments on the thing once it actually works and then go from there. So –

I guess it does vary depending on the types of papers you're reading. Some of the papers I have are like very heavily like CUDA performance based. Um, and then some of them, um,

Some of them are more like maybe data-based and – or like not database, data-based. So maybe they use some sort of like post-training mechanism on the neural network to like make it like specifically smart on a certain thing or not give unethical responses to things. Yeah, there's – I guess it depends. But I usually hold the structure that it's like search down depth-wise like a tree. You start from the top –

Organize some way to the bottom. It's not just going to be one size fits all, right? So a lot of this is like me kind of hand-waving, throwing vague terms around. A lot of it is like you have to figure out what works best for you. But this is like to my knowledge what I find is like the easiest. Yeah, and I imagine as you gradually get more and more kind of background knowledge and like –

more familiarity with the field that you're studying, then it becomes much easier to tease out what is actually interesting and relevant here. But for a first paper, so your course intro to LLMs that you published a while back, you have three academic papers that you include in there. The famous attention is all you need paper, a survey of LLMs, and QLORA, efficient fine tuning of quantized LLMs.

So would you say those three papers are a good place to start? Like if you just want to just kind of dip your toe into reading academic papers that are designed for scientists and people that have PhDs and stuff to read, not designed for lay people to necessarily read, but aspirational paper readers, could start with those? Yeah.

Yeah. I think, um, I think, I mean, times are changing very quickly. Like there's always new papers coming out every day. So it's hard to specifically say that those will be the papers that hold forever. I still think like that might not be the absolute best choice I could have made, but I think they're pretty good. Um, and I'll give my reasoning for that. So you have attention is all you need, which is the core of the, the core of the actual thing that I'm teaching in the course. It's like, okay, this is what everything is based on. Learn this. This is important. Um,

And then you have a survey of large language models. So for people who just want to look at different trained language models and see how they compare and just get like a rough, like high level idea as how things work and not get super, super technical like we do in the Jupyter notebooks in the course, right? Like that stuff, that's what the QLORA paper is for. So QLORA, it's no longer just like super high level, but it's more so you add like this, you add this special way, the super efficient way to fine tune

and quantize and fine-tune existing pre-trained language models. And you specifically... I don't want to get into the technical details of it, but you have a smaller matrix that you do operations with a larger matrix, which is the bigger network. And then you essentially... You're able to...

You're able to be very efficient about how you train it because there's less operations being done. You're not training the whole thing, like updating every single thing in it. It's just you have this smaller thing that you're multiplying into this bigger one, and there's a smaller section that you're actually modifying there. So it's a lot more efficient. And when you quantize it, which is another technical thing, where you quantize is like when you go from FP32 weights to FP16 or FP8. Yeah, basically kind of breaking it down to smaller bits.

Yeah, because quantization is something we do in music. For example, if you're not a very good drummer, you can fix your drumming by quantizing. I don't do that, but it is a shortcut to making your drumming sound better.

So thank you very much for sharing these many insights about drilling into these papers. I'm going to encourage everybody listening to check out Elliot's courses, of course, which I've linked below. You can definitely read more about him. You've got a really cool personal webpage. Shout out to maintaining a personal webpage in 2024. I encourage everybody to go out there, purchase a domain and own your own destiny, own your own real estate on the internet by building a personal webpage like Elliot has done.

And man, it's just such a pleasure talking to you. I like all your balanced takes on a lot of these different aspects of where things are heading. I love your cautious optimism, I guess, about how fast things are going to progress and how things are going to improve. You have to be careful in this field, yeah. Yeah. I mean, there's a lot of hype out there, but there's a lot of freaking reality too. Like these tools are...

Very useful. I haven't used cursor yet. Like I mainly just use, uh, you know, LLMs for like, like developing things kind of conversationally. Uh, but, but that, that sounds like a really cool tool. Um, and, uh, yeah, like I, I would encourage people to learn more from Elliot and, and keep checking out this podcast and I'm going to have a lot of other luminaries and product

prodigies, if I may go so, uh, you know, people that are precocious, they're like wise beyond their years, like Elliot on here. Um, and we're going to just get like a great cross section of smart folks. Elliot, I'd love to have you back in the future, uh, to talk more, uh, maybe in a year or so. And like, depending on how things pass, how quickly things may move, maybe sooner, but like, it feels like there's so much to talk about. And, uh,

I'm really excited to see how things unfold for you in terms of just like little improvements you can make in your own process and in your own learning and in your own development of like tools and apps and things like that. And thank you for being so open and learning in public and sharing these artifacts of your learning through the free code camp community and through your own YouTube channel, which you've got lots of cool little tutorials like, you know, like your, uh, raspberry pie, uh,

GPT tutorial was really cool. I watched that this morning while I was eating breakfast. And it's just like cool to watch you just wire stuff, just doing stuff, right? Permissionless innovation, just going out there and getting it done and picking things up. Exactly. Yeah, man. It's about the journey, you know? Yeah. Well, thanks again. And everybody tuning in, have a fantastic week. Until next week, happy coding.

#155 CUDA and GPU Programming with Elliot Arledge 01:19:49 Share