We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Story: Hatetris - Obsession, Friendship, and World Records

2025/3/3

CoRecursive: Coding Stories

AI Deep Dive AI Chapters Transcript

People

Adam Gordon Bell

David Freiberg

Felipe

Topics

David Freiberg: 我对Hatetris这款游戏非常痴迷，并投入大量时间和精力试图打破世界纪录。我尝试了多种方法，包括使用Mathematica、Python和Rust编写模拟器，并尝试重新实现AlphaZero算法。虽然过程中遇到了许多挑战，例如计算资源不足和算法理解不完整，但我和Felipe坚持不懈，最终取得了成功。我们还与其他开发者合作，分享经验和方法，共同推动了Hatetris的纪录不断被刷新。我从一开始就对Hatetris这款游戏非常感兴趣，并尝试了各种方法来攻克它。我编写了多个模拟器，并尝试使用机器学习算法来解决这个问题。虽然过程中遇到了很多困难，但我从未放弃，并最终与Felipe一起打破了世界纪录。在与其他开发者的合作中，我学习到了很多新的知识和技术，例如NNUE算法和beam search算法。这些经验让我受益匪浅，也让我对游戏开发有了更深入的理解。 Felipe: 我起初对Hatetris并不感兴趣，但被Dave的热情所感染，并加入了他的项目。我主要负责处理实际问题，例如选择合适的编程语言、使用版本控制和备份等。在项目过程中，我与Dave密切合作，并通过Discord交流想法和数据。虽然过程中也遇到了一些挫折，例如计算资源不足和算法理解不完整，但我们最终取得了成功。我与Dave的合作非常愉快，我们互相学习，互相帮助。在项目过程中，我学习到了很多新的知识和技术，例如Rust编程语言和beam search算法。这些经验让我受益匪浅，也让我对软件开发有了更深入的理解。在与其他开发者的合作中，我认识到了知识分享的重要性。我们乐于与他人分享我们的经验和方法，并从中学习到更多新的知识和技术。 Adam Gordon Bell: 本期节目讲述了David Freiberg和Felipe如何攻克Hatetris这款游戏的历程。他们从最初的尝试到最终打破世界纪录，经历了漫长的过程，并展现了强大的毅力和团队合作精神。他们的故事也体现了好奇心、协作和发现的乐趣。 David和Felipe的故事告诉我们，即使面对看似不可能完成的任务，只要坚持不懈，并与他人合作，就一定能够取得成功。他们的经验和方法值得我们学习和借鉴。 Hatetris的纪录不断被刷新，也体现了技术进步和知识共享的重要性。通过合作和交流，我们可以共同推动技术的进步，并创造出更多令人惊叹的成果。

Deep Dive

Chapters

Dave and Felipe, driven by a new world record in the notoriously difficult game Hatetris, embarked on a challenging journey to conquer its high score. Their quest involved building emulators in various programming languages and tackling complex algorithms.

New world record in Hatetris sparked Dave and Felipe's obsession
They built multiple emulators in different programming languages (Mathematica, Python, Rust)
Hatetris's deterministic nature and complex algorithm presented significant challenges

Shownotes Transcript

Translations:

中文

- Hello, this is Co-Recursive and I'm Adam Gordon Bell. Today, as always, we're diving into the people behind a piece of software.

Because sometimes what looks like a simple puzzle can unravel into a web of complexity. Today we're exploring the story of two people who got caught up in a JavaScript game from 2010. It seemed like just another Tetris clone, but it quickly became an obsession that would challenge their skills, their wallets, even their relationships. My name is David Freiberg. I am a material science PhD.

And I am very, very obsessed with a JavaScript game that came out in 2010. I'm Felipe. I'm a software engineer at HubSpot. I'm not quite as passionate about this problem as Dave, but I've gotten dragged kicking and screaming into it, so I now guess I'm a domain expert on it. The game is called Hatress. If you've played Tetris, you know that moment when you need just the right piece. ♪

Like I've got it all lined up, if I can get that four in a row, that eye, I can clear four lines.

Now picture a version of Tetris where you never get the piece you want. Where the game seems to have it out for you. That's Tetris. It's a hard, hard game. And for years, it had a seemingly unbeatable, very, very low high score. And then these two friends, Dave and Felipe, got obsessed. They poured months and months of their lives and a bit too much money into toppling that record.

And depending on when you listen to this episode, because the competition for this game is actually fierce. But as of right now, Dave and Felipe are the current record holders, world record holders. But before we get into all that, let's go back to 2010 and back to the great science fiction writer, Sam Hughes. Back in 2010, Sam Hughes created Hatress, just published it on his blog as a novelty. About a month after the

game was published, a Japanese slash-dot commenter named Diasuke got 30 points. And it stayed there for years and years and years. 2015, I took a class in general game playing on MIT OpenCourseWare.

And this was an interesting class. Instead of writing a program to play chess or to play checkers or to play connect four, you were writing a program that could play any game given the rules. Instead of just being given a position, you were given a position and also all of the relevant rules that went with it. I started reading QNTM and I started looking at Hatress and I messaged Felipe a couple of times asking him, could we apply it?

to Hatres could we apply this game playing and Felipe quite wisely pointed out that my general purpose game player could barely beat me at Connect Four and it couldn't beat me at Checkers and Hatres is way way way harder than pretty much any human game and yeah Felipe was right about that so we shelved it. So what happens next is that in June of 2021

A Japanese computational Tetris expert, and yes, there is such a person named New Jade, sets a new world record. We thought it might genuinely be impossible to get past 30.

This max score, I believe, was based on simulations that they had run. Unlike Tetris, Hatres is totally deterministic. You always get the worst piece based on the well you have. And even in narrow wells, wells being the Tetris term for that place where the pieces fall, even in narrow wells, the game state ballooned faster than you'd think. So 30 points, 30 lines cleared, it felt like an absolute ceiling. But then New Jade hit 32, so clearly they had missed something. And Dave started working on an emulator, and I was like, okay.

We're doing this, I guess, because Dave has started working on the emulator and I'm not going to be able to gently guide him away from the idea of doing this. That's what kicked off the project in my brain. That's where it went from impractical idea we're discussing, which is 90% of our conversations, 80% of our conversations to idea we are doing.

This is normal for these two, but before we get deeper into how they tackle this challenge, you need to understand something about them. Dave and Felipe, they're buddies from back in high school, but they are not casual in their approach to these challenges. Dave's the instigator of projects. Felipe's the one who reigns them in with practical concerns. But whatever they're working on, they're always knocking around ideas.

Dave will send me some idea like, I was thinking about how to use machine learning algorithms to solve a game. And I will say, Dave, that is wildly impractical. How do we make your idea practical? Because I like the core of it, right? I really want to like,

Maybe we can't solve every game. Maybe we can focus on one game. And we'll narrow the idea. We'll refine it. We'll talk about it. And we'll come up with sort of a practical approach. And that's usually where the conversation will end, right? I'll be like, yes, we can predict all stock markets forever. But, you know, you could maybe focus on predicting one stock and you could use all these other variance factors to figure it out. And I read this good paper. Maybe you go look at that.

And then if Dave actually likes the idea, two days later to a week later, he will show up and say, I wrote a Mathematica script for this. Here's the data that I got from it. Here's the parts that don't work. I didn't use a database because I don't believe in those. And all the data is sitting in text files in my hard drive right now. And it's full of these. And I'll be like, Dave, I gave you like four practical ideas here. Why did you do it this way? And so honestly, most of this stuff goes as snippets of conversation had over Discord, right? It's like Dave sends me a message that said something along the lines of like,

I just tested the algorithm sorter and I got these results. And I say something incisive like, well, those results aren't very meaningful, Dave. What can we actually learn from these? And he's like, well, let me pull up Mathematica. I'll make a graph. And I'd look at the graph and make some more commentary on it. We'd have sort of a discussion about it, right? And the conversation would pick up on and off throughout the day as we're available. So I'll be like, okay, I have to go to make dinner and I'll go make dinner. I'll come back and Dave will have posted.

five more paragraphs in my discord and i go read them and i make a bunch of commentary and dave's off swimming because he's he's done a he's practicing for an iron man were you rising for an iron man at that time i think you were and so he's like i just got back from swimming 8 000 miles let me post some more thoughts and so like 8 000 meters come on dave your magical exercise numbers are meaningless to me i do a lot of biking and i do a lot of swimming so a lot of this is i'll look at a whole bunch of data right before i leave

And then as I'm biking to the river or when I'm in the river, I have nothing else to think about or focus on. So I'm just sort of mulling over the data in my head. And when I get back, I send Felipe messages in the same way that cannonballs are sent from cannons sometimes. And so he'll be hit with a barrage of a couple thousand words based on whatever I was mulling over. Any reference to practical stuff like VS Code is going to be Felipe.

He forced me to use things like Git and version control and backup copies. Still haven't forgiven him entirely. I mean, we only lost most of a project to not keeping any backups and not using version control, so you don't. It's not like I managed to get you there until we did that. Yeah, this is very true.

So this is a thing that Dave and Felipe do. They work on projects together. They send around messages. And then at night, depending if they're busy or not, they might jump on a Discord voice call. They might get VS Code Live Share running. They might work on some coding. Sometimes the projects get somewhere. Sometimes they wouldn't. Sometimes at the end, they'd write up their work, share it on the blog. Most of the time, they wouldn't get around to that.

But yeah, Hatress was always one of these topics on the back burner. It would come up now and again. Dave wanted to tackle it. Felipe pointed out it seemed so out of reach. So other projects would get priority. But like many back burner ideas, Hatress was quietly preparing to take center stage. Because little did they know, once they truly committed to beating this very hateful game, there'd be no going back.

Because when a project grabs your interest, especially when there's a group of you, the effort you pour into it over countless evenings and weekends can be surprising. It can add up. So yeah, New Jade hit a new record, 32. And Dave thought maybe they could beat it. But first they needed to build an emulator. So...

We built not one emulator, we built six total. Started in Mathematica just to get something that worked and to get an idea of how slow things were. In Mathematica, a single move took about a tenth of a second to calculate. And there's a fair amount of computation that goes into Hatris because the Hatris algorithm doesn't give you a piece at random. It looks at the well and it calculates which piece is going to be worst according to a minimax algorithm.

it makes sense that it would take a little bit. Our current one runs approximately 200,000 times faster. So currently we run about five microseconds per move per core. So even if even on a single thread that's still tens of thousands of times improvement. The very basic emulator consists of just trying every possible spot that a piece could go and doing the minimax algorithm from there. It was not very fast in Mathematica. So

What Felipe eventually convinced me to do was rewrite it from Mathematica to Python, which was barely any faster, and then from Python to Rust. And this I will blame Felipe on. I will blame this on you because I don't think I would have gotten into Rust if you hadn't dragged me into it. I do take the blame for that particular approach. As background, we had no other experience with the language.

We did six AOC problems, and our seventh thing was re-implementing Google's AlphaZero algorithm, complete with GPU neural nets from scratch. It was a bit of a leap. Yeah, AlphaZero, that deep mind project out of Google, that was their inspiration. I mean, there is a published paper with formulas and stuff in it, so how hard could it be for the two of them to re-implement it all?

At the time when he brought up AlphaZero, I was playing a lot of Go and I had seen what AlphaGo could do. And so I was like, oh, actually, I do think this has a lot of potential. Let's talk through the actual practical implications of doing this. And at that point is when I knew this was the idea we were going to do, right? We took the parts that we understood and implemented them. And the key element being the parts that we understood. Because at some point, I remember we were talking about this and then the topic of Drishlet noise came up. And I'm like, I don't know what that is, Dave. And Dave looks at me and says, I also don't know why that is.

And I'm like, so are we using it? And I'm like, I don't think we actually implemented that anywhere. Does it matter? And then we look and we're like, it looks like it could be important. Yes, but the problem is that when we actually implemented it, our results got worse because they were trained without the Dirichlet noise. Yes, because we didn't understand the looper too well. We did because we didn't understand the paper.

Even if they didn't totally get all the details, they got an AlphaZero approach up and running. And then they could run it and they could watch the logs and they could see if it was actually learning anything and getting better. So you would see a text file. And it was a text file because Felipe never felt like writing a proper log. And so I did it. And what you see there is when we're in the AlphaZero phase, you would see a printout of a game.

And every new game that was played, you would get a list of lists of numbers where each list of numbers represents the well written as binary. You would get logs of these and each well would come with a score if it had gotten any lines. It would come with the network's heuristic estimation. It would come with a couple other bits of metadata like how many children it had and so forth.

And you would see tens of thousands of these. We would get one every few seconds and we would control F through to manually look for the high scoring ones. Most of them were not terribly high scoring, of course. I have a very fond memory of sitting with Dave on a voice call, actually, and just looking at games that had been played through and trying to figure out if there were commonalities or traits and what the network was trying to think of as we went through these things.

And if I remember correctly, you made a little emulator that let us look at the games in mathematics and it was clicking through and seeing, okay, what happened here? Why did it do this? And then looking at a movement being like, that move is terrible. Why would it do that? Why can't the network figure out that it shouldn't be doing this? It was probably like one in the morning or something as we're doing this. But yeah, there is a problem with this alpha zero approach. What we needed was a data center's worth of TPUs hooked up to a bunch of hard drives and computers running all day so that we could change meta parameters and test things.

So that was a big factor. Our implementation was the implementation of two people who read and half understood a paper and then skipped over some fuzzy details that clearly didn't matter that turns out probably kind of mattered. They were probably important. And I think we just fully didn't grasp the scope of what needed to happen in order to make this an effective project.

solution. So we were kind of like stuck in that circle of tweak a thing, it gets worse, but did it get worse because we tweaked the thing or did it get worse because we're in a local minima and don't know how to fix it? And then we can't iterate on that because we don't have the hardware in order to iterate on it properly. So it was just a thing where it was like that feeling you get when you know you're approaching a problem in the incorrect direction and that yes, you can keep tweaking things, but nothing is going to get you out of this hole. It's I think the sentiment.

Imagine climbing a mountain and then realizing that the peak in front of you you've been trying to scale is not the top, but just a false peak. The real peak is behind it. That's where they were. Resources were running thin and each passing day it felt like a warning. This problem was much bigger than they thought. You get some obscure error that is like some rust backtrace error that is a seg. You've met overflow somewhere. You're like, what does this even mean? Why is this seg faulting?

And we just look at it and either fix it or say, actually, we won't use this library anymore. It's fine. We don't need to use sparse matrices. That won't help anyway. But yeah, there was also moments of hope. My personal record for playing Hatress myself is 10 points. So the moment when it first beat me, that was exciting. And there was a couple weeks period where it slowly rose from single digit points to 22 points. And that did feel like it was getting better.

And then it just stalled. And no matter what we do, we couldn't improve it. But there were moments in there where we were really happy. There were moments in there where we found ways of speeding up our emulator and getting more games per second. I'm not saying it was all bad. It's just the bad is the part we remember because the bad, chronologically speaking, was the majority of that time.

For months, they kept pushing on this problem, working late nights, exchanging code. But now, on the edge of a breakthrough, they just felt so much doubt. This is kind of one of the moments that happens in any given project. You reach the redirection point, right? Where you're like, I'm still interested in the core idea behind this. I still want to solve this problem. But whatever it is I'm doing isn't working for whatever reason. And...

Honestly, I can pretend that this is a measured decision where you sit down and you make a list of pros and cons, but really it's how annoyed am I at this? And do I still want to keep fighting with this specific set of problems, right? I think we had a conversation where I basically said, I don't think at the scale we have, unless we're going to shell out a ton of money to Google, we can do this. Is that when we applied for the TPU grant? Because we looked at this, we looked at like, what would it cost us cloud-wise if we wanted to do this properly? The answer was it would take about a month's worth of a thousand TPUs running full-time.

So we applied for a grant to get a month's worth of 1,000 TPUs running full-time, and we got the grant. And we never used it. Think about Merchant of Venice, the pound of flesh not including any blood. That's more or less the scenario we encountered here, where we had the ability to use for free thousands of TPUs for about a month. We did not have the ability to use for free any of the CPUs that we would need to coordinate those TPUs.

Or the hard drives. Granted, the TPUs were still doing the majority of the calculation, vast majority. But the cost, even of that, of the ancillary controls and logging and so on and so forth, would have been well over our budget. Our budget of $0 in a shoestring, to be clear.

So frustrated and realizing that this neural network approach they had been throwing themselves at was just beyond the compute resources they could afford, they were ready to give up. But then NewJade appeared on GitHub and started shaking things up. He posted a gist that included his entire approach in Japanese. And Google Translate was sufficient with pictures to figure out what was going on.

We'd been looking at Hatres for quite a while at this point. So there was nothing in the explanation that we needed to have explained to us as far as like, how does the game work? What mechanism am I using? We were mostly interested in the broad strokes, like, what are you looking at? And he made some really good graphics actually, that really captured the idea of the shapes of things that he was looking at. And that was very helpful. And it was a pretty succinct explanation. It didn't get into a lot of the technical implementation details, which was exactly what we needed.

So it was just a really good write-up job on his part, even in Japanese, which we don't understand. We're like, none of this is that hard compared to what we're trying to do. I bet we could come up with a better heuristic. We're good at looking at problems.

And I think that idea sort of overtakes everything else. And we're like, okay, what does a heuristic look like if we're starting to think about this problem? What are the things we should consider that New Jade hasn't thought about looking at this problem? Because sure, he's a computational Tetris expert that's been studying this problem longer than we have and come up with a better solution. But obviously, we can do that. We can do better than that. We can come up with a better idea. And I think that's what starts things. Is that right, Dave? Yep. We decide that good artists borrow and great artists steal. And that is precisely what we did.

And then, while they were re-implementing, or stealing, or whatever you want to call it, New Jade starts posting new high scores. I think Dave sent me a message that said something like, the new record is, what, 66 points? And I was like, that, come on, that can't be right, Dave.

The last record was 45 points, and we've speculated that the roof is at like 50, maybe 60. Because of course, every time there's a new record, right, we raise the roof by like 10 points. I'm like, there's no way it's humanly possible. And at this point, I say something along the lines of famous last words. Oh, the maximum score is probably around 100 then. I think that makes sense.

And so we say, okay, let's just start by recreating the work, see if we can get something similar, and then we can start tweaking the heuristic to make it more suitable with our understand. What we'll bring to the table is we'll do a lot more data analysis, and we'll have a better idea of a heuristic or things we can add to it. From there, it took us about five months because we made two crucial mistakes in our reimplementation, and each mistake cost us literally months of runtime.

Re-implementing New Jade's solution isn't super easy, especially if you don't have a PhD in computational Tetris or you're not familiar with computational game searches. Imagine exploring multiple paths of a maze simultaneously. You go 10 different ways through the maze and when some of them dead end or look like they're not promising, that's when you can branch off on some of the ones that do look promising so that you always have your best 10, your best N running in parallel forward through the game space.

You drop the rest and you focus on these winners, but you're always moving through multiple paths at once. And how many of those you move through at once, that is your beam's width. That's what makes a beam search. But simulating all of these demands a lot of computing power. The conversations with these things always go like this. How long could this take? I don't know, a day? Sure, a day seems reasonable. And then it's a day later. I'm like, how far has it gotten? Well, it finished the first beam. Okay, so two weeks later,

We were very hopeful it was going to run for less than a month, and then it was just much slower than we had expected. And so each run takes a month, and it's a month of looking at the logs go, seeing the score slightly increase, and being like, oh, God, I hope we didn't have a bug in this portion of the code, because if we did, we're going to get two weeks in and have to start over. And we're looking at log files, and we're discussing what's going on. And obviously, it's a lot of just refreshing log files and talking back and forth and being like, well, the newest high score is this.

I remember I would sign off for a cop on discord. I'd see a message from Dave that would be like, the current score is 30. And then a message an hour later, the current score is 32. And we'd hop off a working session, right? And I'd hop back onto the console connect to look at the logs. And I'd send a echo message to echo to anyone else on the server being like, Dave, are you still online looking at the logs? I'd get a message back being like, yeah, like, yeah, okay.

And so you did climb to 32. Well, the record was what at that point? The record at this point was 66. And we eventually got our way up to 53 over the best one of these searches, which was still very good. It was second best in the world. It was better than anybody else except New Jade, but it was not quite good enough.

And so that's when Felipe got the idea to, instead of using the single desktop computer that was at this point already five or six years old, instead, let's actually purchase some computer time and just get this over with in a day.

This is where things got serious. Instead of running this program at home and it's just something fun, they were about to rent a 72 core machine in the cloud and the clock would be ticking. This was on AWS and if anything went wrong that money would be gone for good, leaving them with nothing but an unfinished beam search. Now this puzzle wasn't just about pride and wasn't just exploring, their wallets were now at stake as well.

Yeah, so I've done a bunch of DevOps stuff, but AWS is my most familiar cloud provider. So I was like, AWS seems like a reasonable choice for this. And so I went out and I priced it and I looked at various solutions. I said, okay, what we're looking for is a large number of threads. I forgot the instance type. This one has a lot of threads. If we're going to run this for 24 hours, this feels like a reasonable price to pay to get this project done.

And at this point we have a shiny brand new heuristic. We've added our own new magical fields to it that will totally make everything better. We've gotten 53 points, so we know that it can be done. And so I think the math we did was like four cores takes a month, divide the month in, and we need this amount of cores. And it was like, I forget how many it was, a few hundred. Do you remember, Dave? I think it was like 72 or something. I think it was 72. Yeah. So we said 72 cores, a lot more RAM. We can make this run quickly.

And we did some back of the napkin math and came up with a budget. And then I said, this is how much it's going to cost Dave. And Dave was like, okay, we can go have these on it. That feels reasonable. Never trust my budgeting decisions ever. I have always, every single time, wildly underestimated the cost of doing the thing. And so we get it set up. I set up the AWS account. None of this is like very challenging because it's all just set up to be click and go, set up the instance. We SSH into the instance, copy your code over because now we're using version control so we can just get clone the stuff over. You're welcome, Dave.

Install the dependencies, get everything going, and kick it off, right? And honestly, it's blazingly fast. You're like, wow, this is going very quickly. Exactly as we'd predicted.

By midnight they were glued to the log file, hoping each new entry would push them past 66 points, each beam would make progress, and one would pass the world record. So then problems start to come up. We actually shoot past 66 points. And then we start to realize, well, wait a minute, if the game lasts longer than we predicted, we're going to have to pay more than we predicted. This was at the 48-hour mark. This is twice the expected runtime, but their code still has to finish.

They have to let it run because it's cleared 66 lines. But until they lose the game, until they finish the search, the program won't finalize. It won't print out the score. So you'd hit a new world record, but you couldn't see it. You couldn't log it as a world record. You just knew that if there was no bugs in your program, it existed. So the better your program does, as you're sitting there watching it, the worst case your bill gets.

And then there's Amdahl's Law. Because we forgot multithreading only works for the actual beam. When you're saving everything, when you're saving the progress from one time step to another, the file read and write, that's single threaded. That's not sped up at all. That was a negligible amount of time when we only have four cores. But when we have 70, that's now the majority of the time.

And so our cloud computing budget sort of retroactively increased till finally the game finished just before midnight. And we got 86 points. 20 point improvement on the record. Huge, huge, huge milestone.

This is a Friday night, right? My wife is like, "Are we going to do anything?" I'm like, "I have to stare at this screen and see if we're getting a record." It's like, "Okay, we have plans tomorrow. You remember?" I'm like, "Yeah, this is fine. Dave and I will be done by this by like 2:00 in the morning at the latest, right?" And so now it's midnight. We have a world record. We can see the world record, right? And we're like very confident there are no bugs. Yes. So the issue is at the end of the game, we now have a position.

and that position is the final position of the winning game. That's not an entire game. We need every position that every piece has gone in for the entire history of the game. To do that, we then reference all those save time steps we made earlier, which is also a single threaded process, which is also something we didn't anticipate. This is even slower than going forward. We thought it was going to take at most an hour,

And an hour later, it was barely started. We did some back-of-the-envelope calculations, and we predicted that it would finish by the time we woke up in the morning. It did not. By the time we woke up in the morning, it turns out our back-of-the-envelope calculations was based on the smaller time steps at the end, because the beam is running out of options. The number of wells stored at each time step is smaller. When we woke up, it was going to take us another 12 hours to finish. It might do. We're looking at our budget like...

I guess we can't stop it at this point, right? Because it's all gone, right? If you have no way of putting it together if you stop? Correct. If we stop at any point before it finishes, then there was no point to the entire run. We need every position of every well or else we just can't get there. By the way, this is peak software design, right? You should always design your processes so that if stopping them at any point destroys all your work, that is how you're supposed to design software. No backups. It keeps you on your toes, I think.

But finally we had the record. It was 86 points. We immediately downloaded it. We got the files transferred off AWS. We stopped the money spigot going into AWS. But they're not done yet. Their wallets are taking a break, but the simulator outputs its internal state for each move, which doesn't match the format that the high scoreboard needs. That requires a move-by-move replay. We took our list of numbers...

had a Mathematica script that turns it into a list of pictures. And each picture represents where the piece is going to go. And then we sit there on a screen share for an hour, manually hitting arrow keys to move the piece into the appropriate position.

And we vowed to automate it before the next record, and then we didn't, and the next record took even longer. Fortunately, Hatress has an undo button, so it was possible to fix mistakes. It also has a replay code button where you can enter the replay code of a game. And that's how very much preferable to doing it by hand. Because it's a deterministic replay code, so you can generate it by just knowing what the moves are. We just never wrote the script for it, so we were like, how long could this take? Finally, though...

We had it. And by this point, we had found New Jade. We found him hanging out on a computational Tetris Discord server, and we were able to get in touch with him that way. He sent us his congratulations. He mentioned offhand that he had his eyes set on a new approach, but that that approach was going to take a backseat for now.

They post their world record of 86 on the Hatress site. It's a big jump, and they're finally landing a world record. They also tweet at Sam Hughes to share the news. Yeah, he congratulated it. He confirmed that the record was legitimate. He was extremely impressed at the level of effort we'd apparently gone through, because we mentioned at the time that it had taken us 11 months to get this. And yeah, from there, we write up a blog post

And what reasonable human beings would do at this point is stop thinking about this problem you have already solved and gotten the world record for. Of course, reasonable is an interesting word to use when you're talking about dedicating the resources of a small engineering team basically to an obscure game from 2010's leaderboard.

Reasonable is not the point. And so they, in fact, did not move on. In fact, they planned to give a talk about all the work they had done. So we get invited to do the DC Russ talk. We spend like three weeks planning this talk and making slides. And our talk is way too long initially. And we cut it down. And there's a whole conversation of, does our audience want to see this fancy infographic I made? Maybe not. Maybe we just cut that.

And all this time, we're talking about this talk and about our project. And you look back at the project, and you look at your code base, and you're like, well, we could have done that slightly better. Well, that's a tweak we could have made. And now that we're looking at it, because Dave says, well, if we're going to give this talk, it would make sense to open source our code so people can go look at our mistakes. And I'm like, well, I don't want to do that without a readme, and Dave says, and we should probably clean up this horrible spaghetti mess over here where we have one giant method that does everything. We could add some unit tests, maybe, just so people don't think we don't ever write unit tests. We don't ever write unit tests.

While pondering all this, right, back steeped in their hatred's work, Dave gets an idea for a new heuristic, where you just look a little bit more ahead. See if you can clear a line in the next couple moves.

And that gives you much more grounding for your heuristic about whether you're in a good place. He calls it the bird-in-hand heuristic, I believe, at the time. We have a whole conversation about how if this gets published, we can name it after him because that would make a lot of sense. And then I get a message like two hours later that says, I explained the idea to my dad. He said, oh, that's a quiescent lookahead. Yeah, that's used in chess engines all the time. I'm like, okay, that fits.

Your dad is a chess engine person? Is that like... He's a programmer, amateur mathematician, and the world's foremost PetriNet enthusiast. I do not say world's foremost PetriNet expert. There are people who know more than him. There is nobody who loves PetriNets more than he does, though. I grew up listening to cautionary tales about how if you're training a genetic algorithm, you've got to remember to include mutations or else you'll get stuck in a local maximum. You learn a lot through osmosis, even if you're not trying.

And so I still frequently call him up and talk about this sort of thing. And he also attended our Rusty C talk. He's always interested in new ongoing programming projects.

That's wild. It's funny to me that you could say like, oh yeah, and then I remember what my dad told me when I was 12, like consider a beam search before you go into it. Yeah, but that is actually the kind of advice he would have given me when I was 12. We, for many years, did project door their problems together. We're both in the top 0.1% of project door their solvers. And Felipe can attest to the getting dragged into that as well. He and I have actually solved the hardest problem on the entire site. Took us about a year on and off.

But what happens next is maybe as exciting, or at least as surprising, as solving a hard Euler problem. This was basically the first and maybe only really fist in the air moment where you're just, you can't believe how well something's working. Our huge beam search, the one that took days to run, that was a width of 25 million. So we were looking at 25 million positions in parallel.

So they test their new heuristic. They try it with the width of 100,000. That's 250 times less games in parallel than their big AWS run. But they have this better heuristic, right? So they're searching through this tree of possible games, but they're navigating towards these longer runs. They're simulating better games. And so even though it's 250 times less games, they get a score of 82 points, nearly their world record.

And to be clear, I did not believe this number. Dave told me this number. I was like, Dave, we have a bug somewhere. That can't be right. It's impossible that this heuristic got that good that fast. And I disagreed until we ran it for a million. And for a million, it gave us 148 points. Now, I am also convinced that there's a bug in the emulator somewhere. Because after Rust DC, we did a bunch of optimizations to make it faster. And I am convinced some optimization we did

caused there to be a bug and that bug caused the game to be easier. And it wasn't helped by the fact that we actually did have a bug earlier on in the implementation that we didn't catch for a while and it had been giving us false games.

So the thing to do, once again, they need to put together the gameplay. They need to verify the record. We hopped on a call. We said, we really, really should have written that script last time. Why didn't we do that? And Dave very sensibly said, because we thought we were done and we weren't going to do any additional work and never, ever again visit this project again. And I said, yeah, okay, let's make sure we write it next time. So you just got a screenshot. You just got an image that shows where the piece has to go, the shape of the well, essentially.

But it doesn't show how to get the piece there. There's a bunch that require complex rotations to get past tight corners and all that sort of thing. And the R script did not include any of that because if we'd done that, we would have just written an automatic replay code generator, which after this, we actually sat down and did because that was too much. It takes us about two hours to actually input the whole thing.

Mind you, every time we make a move, I'm like, this is clearly where we're going to find the bug, right? Like, we're just joking about that every few minutes. It's like, obviously this move is illegal and this is not going to work. And so old record was 86 and this was 148, which was the largest increase in Tetris history, either in absolute terms or proportionally. There is no way anybody will top that. That's beyond any of our predictions for the maximum possible score for this game.

It's over. There's no way. Felipe, what happens a week later? Someone gets an even higher score. And this one's kind of on us. Dave has a friend that he does Advent of Code with sometimes. Dave hasn't said this, but he's often on the Advent of Code leaderboard because he's very good at that type of programming. I blame his Mathematica skills and his desire to be up at midnight solving a problem. And Dave is having a chat with his friend.

Did you show him the Rusty C talk? Is that right? Yes. He saw the Rusty C talk independently and then he messaged me. Yeah. So this is Tim.

They're talking, this says, hey, I saw your Rusty C talk, and I actually know how to write Rust really well. So I made my own version of the emulator. And then he gets 157 points after Dave tells him that the bird in the hand heuristic, because that's just the type of people we are. It feels like your dad should have had a recommendation for this. If you're in a competitive coding contest, don't tell the other people on the board your tricks. I

I can't endorse that because every single time we've worked on a project and we've been doing something really hard and we've talked to someone in the same field, they've been like overwhelmingly happy to share whatever it is they're working with in amounts of detail that I consider like actual work. Like it was like, here's what I'm doing. Here's how I'm doing it. Here's what I'm thinking. Let me know if you want me to look at your code type stuff. So we take the same attitude. If someone asks us what we're doing, we're going to tell them in as much detail as we can because we're

Yeah, it's a competition, but the goal is not to win. The goal is to defeat Hatress at its core, right? And like, break the will of the game, because that's what we're all after. Dave and I now spend the week sitting on various Discord chats as 170 gets posted.

And then 232 gets posted. And we're like, we really need to come up with a better way of doing this. Are these all your friend or? This is all Tim. Yep. And Tim has a better computer than we do. I will give him that part. He's also better at Rust than we are. He's to ask Dave, hey, I see you guys did, what did we do? Used a priority queue, but you don't need to do that. You can just do this. And we're like, oh.

Yeah, I guess we could have done it that way. That would be much faster. Yes. Thank you, Tim. And so he's tweaking his heuristic. He's changing it. He's posting more scores. And Dave and I are seething is not the right term, but we're definitely sitting in Zoom chats. We're like, OK, what do we do to like we have we can't beat them on like technical skill or on like hardware. We need to do something to change things up. And we got nothing. Our conversations are like maybe we can do an analysis and do a thing.

So, do you remember what day of the week it was? Tin post, um, 232 points, and we're like, oh gosh, okay, well, I don't know if we can break the 200 point barrier. And 12 hours later, New Jade shows up on Twitter and posts a score of 289 points. We talked to him, we shared our heuristic, and, um,

He did something radically new, right? He used an entirely new set of techniques. But basically he posts, all right, guys, this is it. This is the definitive score. And we say, yeah, okay, I guess we're done working on Hatrous. So we started with neural networks and we abandoned it for New Jade's heuristic approach. And Tim also used the heuristic approach. He was just better at it than we were. While we were doing that, New Jade was busy abandoning his heuristic approach in favor of a neural net approach. But the neural net approach was very different.

So AlphaZero is designed by Google. It's designed by people who have millions of dollars of hardware. There is another and objectively better algorithm for neural net learning of chess that was developed by a Japanese chess shogi enthusiast because shogi does not have anywhere near the resources that chess and Go do.

And so this is called NNUE. It uses sparse neural networks to dramatically accelerate how much effective compute you have. If you have a huge neural network, but 99% of the inputs to that neural network are zero, your network is effectively running 100 times faster than it otherwise would be. And he used this to make a world record setting Shogi program.

And the Shogi program has a name that comes right out of anime. It's called the End of Genesis TNK Evolution Turbo Type D. That's the actual name of the program. We had never heard of it. This came out in 2018. We had never heard that this was a possibility. So yeah, NewJ documented this approach for the record, which is how Dave and Felipe learned about it. He incorporated David's bird-in-the-hand strategy with his new neural network approach.

And this all led to the discovery of the very first loop. And we'll explore loops shortly, but by incorporating Nujak's insights back into their code, David and Felipe quickly hit an even higher record, 366, taking the record back from him. And then around this time, I pinged them about maybe doing a podcast episode. So in the same way that preparing for the Rusty C talk got us talking about the problem again,

When you messaged us, that also got us talking about the problem again. And so we started looking, not necessarily a new approach, but a new way of analyzing the data we already have. And we actually have a graph of what are called loops. Now, in theory, if you could have a position in Hatres and then get back to that same position, you could run forever. And to prevent that,

The Hatress algorithm has a built-in loop prevention rule, which if it sees that you're going to get to a well that you've seen before, it'll give you some other piece so you can't get there. New Jade's 289 point record was also the very first record to ever discover a loop. And soon after, Tim discovered his own loop, and soon after we discovered some of ours. And these were moves which...

if that loop prevention rule wasn't there, would get infinite points. You could just play forever. And so then the question that we've been asking ourselves for the last couple weeks and what we've been investigating is, how many loops are there? And what's the structure of these loops? What kind of well leads you to a loop? Because obviously, if you can figure out that, you are a very good way towards understanding the game entirely.

This shifts Hatres from just being a machine learning, a game search problem, to something that David and Felipe are quite good at: pattern analysis. You can only take each loop once, but if multiple loops start at the same spot, if you can force Hatres to send you on a second loop, the game changes.

It's exciting to see these ideas sparking off one another. Sharing leads to learning and everyone's growing from each other's insights. Each published idea is building on the last. Our current approach, holding the world record, is based on New Jade's approach. New Jade's is based on Tim's. Tim's is based on ours again. Ours is based on New Jade's again. New Jade's approach for 32 points all the way at the beginning of our involvement

He actually did not come up with the idea of using a beam search for Hatris. That was another programmer by the name of 3Pipes. 3Pipes, in 2019, for a class project, decided to write a Hatris solver. And he made an attempt at it. He extracted the Hatris emulator from the original JavaScript rather than writing his own, which, very interesting approach. He made a beam search implementation. And it didn't get very far. He got a score of 16, 17 points.

He finished it for his class. He never touched it again. He moved on with his life. But he wrote down what he did and he published it online. And New Jade found it in 2021. And that's what gave him the idea

to start the beam search that led to his world records and that got this whole thing started off. It's not all about flashy thousand word blog posts. It's just about coming up with some idea, implementing the idea, and then telling people if it worked or not. Hatres does not matter in a very real sense. This record is completely irrelevant. Nobody but us cares.

The concept of keeping a continual chain of knowledge going, of reading what other people have done, and then writing down what you've done differently and what you've done the same and whether it worked, and passing that on for the next person, that's something that's universal, I think.

David and Felipe are impressive. It's amazing how they tackled a specific game problem, pushing the boundaries, and making unique discoveries. And yet the thing that they want to make clear is they are not, in fact, specially qualified to take on tasks like this. It boils down to, if you're working on a project and you are not the expert on the project...

There is a large community of people somewhere, or a small community of people somewhere, who's probably thought about this problem before. They probably put time and energy into this problem before. And they're delighted that someone cares. And so if you show up and you send them a nice message and you say, hey, I saw your post about, oh, let's say Trackmania. I see you tried this on Trackmania. And we're trying a similar thing. And I was curious about your approach. They will reply to you, even if they haven't touched this project in three years, and tell you as much as they can about it.

And those types of like interpersonal connections with other people across communities working on things ties into the idea Dave just gave. But I think it's even more fundamental, the fact that like people who care about projects are happy to share their knowledge. And if you just seek them out, they will share it with you freely, happily, without any particular complaints.

And so, yeah, this is the story. This is the story of how two friends who couldn't understand the AlphaZero paper but thought, hey, we'll implement it anyways, got to the frontier, got to the coalface of game search trees, or whatever this field is called. They just kept pushing it, even if occasionally Felipe had to check in with his wife.

Oh, she indulges my, I am doing a project with Dave because she doesn't want to hear any more about Hatress at the Dinner Table. So it tends to be a lot of, are you doing stuff with Dave? Yes. Okay. I will leave you to it. I don't need the details. On your side, Dave, is there, I don't know what your situation's like. Is there any relationship strife ever caused by these projects? No, I can't say that there is. I don't.

I expect it could happen at some point because I can't see giving up doing these projects at three in the morning anytime soon. But I'll cross that bridge when I come to it. Where do you work, David? Like I missed it. Oh, currently nowhere. I am looking for work. Yeah, the most recent place I interviewed it with, I had five interviews over the course of two months. If you have five interviews, including one where you get a full site tour and it's a three hour process and everything,

You really are getting your hopes up, even though you're telling yourself not to get your hopes up. And they said, basically, hey, within a week, we will get back to you one way or another on the final offer. And that was three weeks ago. I've emailed them back, just never heard anything. So that's why we do this sort of thing to distract us. At least that's why I do it. I can't talk for Felipe. Beyond hitting 366, Hatress offered David and Felipe a rare escape.

It was a place to bond and experiment and swap stories late into the night. I feel like there's something deeper here. The reason we chase these seemingly trivial puzzles, I've been known to do it myself, maybe not quite as far. But the reason we can pour months of our lives into side projects, into a JavaScript game from 2010,

is because there's something fundamentally human about this process of striving. It's about curiosity and the joy of discovering and about the thrill of pushing up against the boundaries of what seems possible. And sometimes it's simply about finding a good excuse to crack open discord late at night and tackle something impossible side by side with a very close friend.

That was the show. Thank you, Felipe and David. I hope your quest continues. After the interview, I got to hang out with them for a bit, and I was shown some of their unpublished work on loops. You know, David showed me Mathematica map of these loops, and it looks like some demented subway map, but...

They're working on something that uses that. A new way to break ground. I'm sure if they do, they'll post the results on their blog. By the time you hear it, it might already be up. So check out hallofimpossibledreams.org to see their latest projects. Sometimes it's hatred. Sometimes it's the dead internet theory. Sometimes it's Superman fan fiction. There's always something they're sharing and something they're working on. And in my short time with them, I

I saw how unique they are. They're both very different people, but in a way that makes them greater than the sum of their parts. And if you're listening this far, I'd be remiss to not mention that you should join as a supporter of this podcast. I know there's somebody out there working on a side project who needs to hear the story of these two. Maybe it's you. But stories can be powerful.

And stories can be something that sticks with you and allows you to learn from somebody else's experience. So that's what I'm trying to do here. If you enjoy it, support me or just listen. That's why I always end by saying, until next time, thank you so much for listening.

Story: Hatetris - Obsession, Friendship, and World Records 48:27 Share

CoRecursive: Coding Stories

Deep Dive

Shownotes Transcript

Story: Hatetris - Obsession, Friendship, and World Records