So, yeah, and Ilya Reznik, I am the instructed machines janitor and the CEO at the same time. And I actually don't drink coffee. Welcome back to the one and only MLOps Community Podcast, folks. Today, I feel honored because we get to talk with Ilya. And there's not many times that you can chat with a staff level machine learning engineer from a company that's
like Facebook or Meta, as it's called these days. Today is that day. Ilya recently resigned, I guess you could say, or quit, the way that he likes to say it. He was more blunt about it. And he did some consulting afterwards. And now he's spending his time helping others recognize the best way to become a staff machine learning engineer.
So we talked a little bit about that. We talked about his recent trip to NeurIPS and everything in between. For those that are tuning in on the podcast level, I've got a nice little song that I will share with you. This is one of my favorite bands, D-Pulse. And they did a little remix of one of my favorite songs. ♪
The velocity of love.
We were talking about why fine tuning is not all it's cracked up to be. And you said that the TLDR is don't do it. Yeah. Fine tuning is a great technique and there's a lot you can get out of it, especially like
you know, in, in the LLM world, uh, if you're trying to get a particular, um, shape of an output, um, it may be worth it, right? Like if what you're really after is JSON, like it may be worth fine tuning the model, uh, to that though, modern models like understand JSON a little bit better. Um, but, uh,
But the problem is like every time you do it, it's a lot of effort. And the first, the first model that come off the press is usually worse than the original one. And the whole idea of fine tuning is you're taking this huge LLM or whatever other model that can perform really well on many different tasks. And you're trying to get it to perform better on your task at the expense of performing worse on all the other tasks. And yeah,
That works sometimes. But if you can prompt engineer your way out of it, like you should do that first. Yeah. The other piece on it, which is hilarious to me, is that you fine tune it and it's a bit of a crapshoot because you don't know if what comes out the other side is going to be better or worse until it comes out. So you spend all this money and time and energy.
on fine-tuning just to recognize that, ah, it's actually not that good. Yeah. Yeah, you've got to be very careful with the data you fine-tune on as well. Especially, like...
You can fine tune a large model on a few thousand examples, but you got to be pretty careful about what those couple of a few thousand examples are. People usually put what they have rather than like think through like, okay, like how imbalanced are my data? Like what's, what's happening here? So yeah, fine tuning is, is a great technique. I liken it to training and embedding in the first place, you know, how,
That's really more magic than anything else because I've trained so many embeddings where you're just pulling your hair out. You can see it's all gone now. And I used to look like you before I started training embedding models.
And and then like at the end, you're like, oh, my objective function is wrong. I shouldn't be training for this. I should be training for something else. And you switch that and everything just works magically the first time, you know, the first time after months and months of work. Yeah. So, yeah. Yeah. And you learn that the hard way. It's not like you can read a blog post and be like, oh, this is exactly what's wrong with mine. You have to just kind of trial and error it.
Yeah. Well, and in fact, there is no blog post, right? Like, in fact, all of these problems are one-offs. And just because it worked this time doesn't mean it's going to work on the next one. And so there are best practices. So you start there. But at the end, there's some voodoo. You've got to make sure nobody's poking your doll with a needle. I mean, I've heard a lot of this fine-tune for form and...
One thing that I can't really get my creativity around is what are the other reasons you would fine tune for form if it's not JSON output? Well, I mean, there's other structured output, right? Like HTML or Markdown or whatever. But also, like, sometimes it's not just form. Sometimes it's...
if you're in an area where the language is really particular, um, so like expert language is very different. I worked with a medical company and we had to do that. Um, like that, that's another place, right? Cause the distribution of tokens is really different when basically every word that you say is Latin with an occasional English in the middle. And so, uh, so those kinds of scenarios where like your output is quite different. Um,
People try to do that for, like, knowledge. So if there's a very particular set of facts. But the problem is LLMs don't have knowledge. They don't store facts. They store probabilities. And so, like, in that case, I don't think it's useful at all to fine-tune. I haven't found it useful yet. Yeah. That's the...
Where Rags will get you a lot further. And even Rag is no panacea. Like, what Rag does is it puts the fact a lot closer in context. But it doesn't mean that... It depends on how strong the prior was for the LLM, right? And, like, if the prior was really strong, it still won't pick it up. So what people call hallucinations are not hallucinations. They're totally, like...
they're an artifact of the way the model works. And so the term hallucination assumes that like, oh, you're just like out in left field. You're not like this is a reasonable prediction based on the distribution of training data. And so I think I think LLMs are extremely useful. I think there will be with us for a long time. I do not think that they're going to be the only technique going forward. And what do you mean by that?
I think there will be, so like on the non ML side, right? Like you have databases, but nobody is like, Oh, you know, like, um, I don't know. MongoDB solves this problem a hundred percent. No, like Mongo is like part of the solution that's there, uh, that we rely on and it's useful. Well, it depends on who you ask. Some of my friends are like, don't ever use Mongo. Uh, but, uh,
But, you know, you need to build a solution around it. And LLM is similar. Like, you need to build some knowledge retrieval around it. Is it RAG? I don't know. Maybe it's graph. You know, you... And there's a lot of talk...
I know you want to talk about this, so I don't know if I should be moving there yet. But like I was just at NeurIPS last week, right? Yep. And there's a lot of talk about how we ran out of data. And what we ought to do is reinforcement learning. And people are like, wait a minute, RLHF. And all the reinforcement experts are like, ha, ha, ha, that's not reinforcement learning. Not at all.
And reinforcement learning is one of those cool things. It's always like the next big thing, but never has been the big thing, you know? Yeah. Like agents. So, so I think there will be some reinforcement learning in that system as well. There will be some rag in that system as well. There will be some other techniques around the LLM that make it actually useful. Yeah.
It's interesting to think about, like you're talking about, okay, databases are there, but then you have different flavors of databases, whether it's your mongos or your cockroach or Oracle or whatever. And you, as the architect of the system, get to decide what your trade-offs are and what you're really trying to optimize for. And so you're going to have...
Maybe it's an LLM is the center of that, or maybe it's not. And maybe it's a graph rag. Maybe it is a fine-tuned LLM or a lot of fine-tuned small language models or something like that that we will see. And it's cool to think about that as you get the flexibility to choose. One thing that I do feel like, though, is that you have...
some standard patterns that are coming up. And if you look at it like trails in the woods or trails to the top of the mountain, you definitely see there are standard design patterns that folks are kind of taking. And one of them is not the fine tuning route.
And so I think that that's pretty clear as to, all right, if we're going to fine tune, we better have a really good reason and we better really need to do it because everything else up until this point has failed us. So it's like fine tuning is our last option. Yeah, fine tuning is expensive and it's...
It gives, even when you're super successful, it gives you a limited benefit. And so like, sometimes that's what you need. I absolutely agree with you. It's all trade-offs, right? Like sometimes that last like two, 3% is exactly what you needed. Uh, but it's not, you know, if the model is really bad at math, fine tuning is not magically going to make it better.
But if the model is like, oh, you know, you keep giving me really informal things when I ask you for a formal letter, or, you know, what I really care about is LaTeX output and what you keep giving me is Markdown. Like, yeah, that's a great place for it. And you could probably get that kind of improvement from fine tuning. But it's not end all to be all. About the best practices, I do want to caution that
you're on shifting sand a little bit. We're all on shifting sand a little bit. So like best practices that emerged today in two years, we might be like, can you believe we used to do that? That's such a great point. I mean, even just with rag, how now people are like, yeah, what we did a year ago is called naive rag because we were naive back then. Now there's this advanced rag or you've got to do a gentic rag. And even today,
what is it, the React agent architecture, that's now not even seen as really useful because there's much better ways to do it. So it's true that this is going to be shifting very quickly. Yeah.
It's almost like month by month you're seeing new and better ways to do things. And maybe they seem new. You get a lot of energy and attention around them. And it feels like, wow, this is much better. And then after the community has played with it a bit, they recognize that it's not actually that valuable. So you are shifting sand is a great way of putting that because you are not super clear on what is signal and what is noise. Yeah.
Yeah. And you know, it's evolving. There's a lot of money going into this. Obviously there's a lot of benefit to getting this right. It's not a kind of thing where you can learn everything now and then spend the next 20 year career on what you've learned. It's the kind of thing where you learn stuff now and then you learn stuff for the next 20 years. Yeah, totally. Well, speaking of which, I mean, that's what,
the conference like you were at last week, NeurIPS, is why it's so cool, right? Because it brings all of the newest stuff out into the open. What were some of your favorite takeaways besides not being able to order an Uber, like you mentioned before we hit record? I actually use buses most of all, more than Ubers there, because Vancouver is a very...
Good public transportation city. I don't know if people from there would say that, but where I stayed, that was definitely really useful. But yeah, I mean, it's, it's, NeurIPS is terrific. NeurIPS is the flagship conference and ML has been for decades. The first NeurIPS came out the year that I was born. So it's been around for a long time.
And it started out as a very academic conference between neurobiologists and machine learning people when they were like, hey, we can learn stuff from each other. And the main takeaway from this really is that I didn't see that many truly academic papers and posters and presentations like their content.
their majority applied right now. And there's a huge push into like, okay, we have transformer models. How do we use them? How do we actually make them useful? Which is not usually the focus of NeurIPS. The focus of NeurIPS is usually, let's look for a new architecture. Let's go off
And so there's definitely convergence on transformers. Uh, and, and you see this from different industries. Like when I was coming up in this, it was like, if you did computer vision, you didn't do language. Like I was the weird one out where I was like, no, I can do either. Uh, but, but now it's like, everybody's working on the same model. So one, uh,
Invited Speaker put it really well. They were like, you know, we used to have a lot of models and then we kind of converged toward the transformer architecture. But now we've got to diverge back into the applications. And how does this apply to everything? So that's probably my main takeaway is that there's a lot of thinking that's happening about the system that we talked about. Like, what does that have to have?
And how does that work? You know, like, and we're way more advanced than we used to be a few years back. We're talking about like,
left to right LLMs versus right to left LLMs. And like they do really well in different kinds of things, right? Like the left to right in English, there are, you know, there are languages that are read the other way. So yes, but in English, the left to right does really well at language. The right to left actually does math a lot better. We're talking about not every token is as important as others. So some tokens are fuzzy and maybe you should be treating them as such.
Um, we're talking about reinforcement learning. There's a lot of excitement about using reinforcement learning as we're running out of data, um, or it feels like we're running out of data. I'm not even sure I buy that argument that we're running out of data. There's, you know, Ilya Satskovor talked about that and I'm like, his argument was that we only have one internet. And I was like, dude, we're uploading a new internet like every two weeks. Like this podcast did not exist. Right. Exactly. Like now it does.
There are rumblings on the side and the hallway track. This didn't make it into the main conference, but there are some rumblings about curriculum learning, which is a thing we kind of gave up on entirely. What is that? Yeah. So first of all, the way that we learn today, it's like if, you know, when you were little, your mother came to you and said, here's 80% of everything I know shuffled randomly.
Now I'm going to test you on the other 20%, right? Like that's just not what happens with kids. What we do is we tell them, here's a sound, here's a word, here's a sentence. And they progress through that in a supervised manner. Like a teacher tells you, this is the first thing you need to learn. This is a second thing. Uh, but that's not what we do with models. What we do with models is like shovel all the soup and like you go figure it out.
And so curriculum learning is about how do we create that step by step where like a model can learn something basic first and then start building on top of that. It involves a lot of data curation. It's very like time expensive, but it might be worth it. It might get us better outcomes. And so like we're starting to talk about that again. That hasn't been a conversation for the last like 20 years, honestly.
But it's coming up again. And just because it's more structured, it will be more efficient? Is that the idea? Well, the idea is that it respects the concepts in the real world. It respects that addition and subtraction are prerequisite to multiplication. And so...
The idea is that you can be quite a bit more data efficient and you can also like start really understanding concepts and symbols that represent the real world. And then you can start avoiding things like hallucinations because you're not reliant entirely on statistics. Those statistics came step by step and you know that the early steps are fundamental. Yeah.
It's problematic, though, because it's a lot harder. But also, you know, NeurIPS was always about being inspired by biology, and this is very much inspired by biology. So, yeah. It's an interesting concept. I don't know if it'll ever gain traction. Lots of things we talk about in NeurIPS never really leave. But...
But that was one that struck me. The other thing for me with my background, so when I was at Twitter, I owned model evaluation, offline model evaluation for all of Twitter. And the interesting thing is there's been a lot of talks on evals and how bad they are. Even at NeurIPS, there were still talks? Even at NeurIPS. I feel like...
The only talks I saw this year were rag talks and eval talks. And it was kind of, if you boil them down, it was like rag is difficult and you got to watch out for where it fails. And here's all the ways that it fails. And maybe here's some tricks you can do. And then eval talks were like, eval suck. We don't have any good, like these leaderboards are steering you astray. It's all marketing. And don't listen to anybody who says that they have a soda model.
Yeah. Yeah. So there was a talk that I went away with like mind blown and it was about something else, but this is the line that like blew my mind. And the speaker was like, yeah, like we have this benchmark and we got 95% on it, you know, or 90 something, I don't know, some high number. And the model can't do anything real in the real world. And I'm like, how, like, what is the disconnect that you can get 95% on your benchmark and still not be able to perform any useful work? Yeah. Like,
clearly your benchmark is not what you think it is. Yeah. It's not valuable. And then like you add, so this is like evals are a mess early on. And we talked about like LLM as a judge a few years ago. Like I was fortunate enough to work with awesome people at Arise on like explaining this and putting that concept together. But
But that's not even the panacea. And then you go into the agentic world. And then you go into like multi-agent world. And I don't even, like, how do you evaluate a multi-agent system? Like the amount of complexity. And, you know, like some of this is understandable, right? I think Ilya in his talk talked about how as the intelligence grows, it's understandable that you will have a harder and harder time evaluating it.
which is cool, but my job is to make sure I can evaluate it. And so I don't know how to do that, honestly. There are a lot of ideas. There are a lot of concepts, but I don't know that we've converged yet. There was a talk by Dee Scully, the CEO of Kaggle. Love that guy. I love him. And for anybody that doesn't know,
He also wrote the most incredible paper when it comes to MLOps in like 2016, which is the high interest credit card debt of machine learning, which...
everyone's probably seen it if you've ever seen a talk on ml ops because it has that diagram of all these boxes of things you need to do to put machine learning into production and the model box is just a small box out of all the other stuff that you've got to think about when you're productionizing machine learning but anyway i digress keep going no that's that's terrific yeah uh yeah d d is amazing and i i think the world of him but um
His talk basically made me paranoid. I was already paranoid, but he's like, at Kaggle, their main problem is that people find signals that aren't actual signals. They're like, they leaked into the data somehow. Somebody put all the positive examples in one folder and all the negative examples in another folder, and your model is not predicting anything real. It's just predicting what folder it was in, right? Which is easy because you have the path.
Uh, and his point was that researchers should look at it that same way too. Not that like you're intentionally going to hack your metrics, but like you're unintentionally going to hack yourself. And, uh, and so he talked about like, okay, so like, what can we even do? The problem is when your data is internet scale,
everything is already in there, right? Like everything that ever hit the internet is leaked. So you're, you know, you can't hold out that, that data anymore. Um, and so he talked about new, uh, benchmarks that are based around, uh,
Some sort of a separation, either a physical separation because it's not on the Internet yet. So you talked about for like some math benchmark, having mathematicians like top mathematicians in the world go into the woods, sit in the room by themselves, come up with problems that haven't been ever with us and then come back and evaluate models on those.
And then he talked about like time separation, right? Is another one. So if you're trying to predict, I don't know, the stock market or something, you can't fully evaluate your model, right?
Until six months later. And, you know, unfortunately, that's not how it works in our stock market predictions. All of those models are like, they're going to be irrelevant tomorrow. So we need to retrain. But his point is that we need to be a little bit more patient and spend a little bit more time generating data for benchmarks. Because even like...
you know, we do our best, but all the benchmarks for LLMs, like all the data for them is leaked. Like all the data for them is on the internet. And so this is how you get to a scenario where you're 95% accurate on your benchmark. And worthless. And worthless. Yeah. That's why I appreciate these leaderboards or these benchmarks that are like pro LLM. That's from the process team. They put it together and all of the
actual data that is created for these benchmarks is all closed source. So theoretically, the models haven't seen them. I can imagine that is only good for the first couple of go-arounds and then the models have seen them. So the process team, I think, is constantly updating that data and it's constantly like creating new ones. And so they have different
You know, they've got I know they have one which is from Stack Overflow and that is public data. So I can imagine that most of the models have been trained on that in some way, shape or form.
Which is funny because a lot of these models still do really bad on it. But take that what you will, or take what you will with that. I don't really know how to understand it. Well, and just to be clear, I know lots of folks who train large language models that we use every day. And I think all of them are like really decent and they try really hard to avoid data that they know they're going to be tested on. But it's like...
It's not an easy thing to miss because you can't look at every data point that goes into this. And when you're just like, just give me the internet. Like, how do you want to train this? I don't know. Then there's a lot of stuff that leaks in. And so like, I don't think it's an intentional, you know, hackery. I think it just happens by virtue of the way that we train these. You're going to love this. I saw some kind of a nonprofit podcast.
That was their whole thing was let's make the Internet mediocre again so that these this AI is not better than us at doing things. And we don't get like unreasonable expectations put on us because AI can help us do it. And so what they've been doing is going around saying,
buying up domains in the Common Crawl. And instead of doing nefarious things, like I've heard of other people trying to do where they will put different poisonous data on these Common Crawl websites, they're just putting a bunch of really mediocre content that isn't good at all on all of this Common Crawl. So it's flooding the... Or what they're trying to do is flood the LLM training data. But at the end of the day, you have to buy...
But you have to have so many of these and put so much content out there to even make like a little drop in the ocean, right? I think the internet does a pretty good job of making itself mediocre without extra help. Without those guys. That's a great point. Yeah, I mean, so the funny thing there is then, I know you've spent a lot of time training models and thinking about that. How do you go about like...
that data issue and the data quality issue? I mean, yeah, it's hard. And, and I'm not sure that I have a silver bullet, right? Like oftentimes a lot of the things that I've trained are pre LLM. Like I, I still, I worked in this industry 10 years before LLM's ever hit. Uh, so, so most of my experience is still pre LLM. Uh,
And even like since I launched came out, you know, that was training Meta's ad prediction model. And that's not trained on the same data. You can be a lot more careful there. But you kind of...
I don't know. I don't know that there are great ways to do this other than you do it and then you look at your evaluations and you trust them, which we just talked about how you can't. And they guide you into the segments where you're maybe underperforming and then you look at what's happening with those segments and then you train again, right? Like the beauty and the downside kind of of working in ML is that you can always try again.
Uh, and you should always go in with an idea of like, I'm probably not going to think through every single contingency. I'm probably not going to think through every single data issue. Uh, when you gain enough experience, you've been burned by enough things to where you start checking them. Uh, but, but like, even for me, like 10 years in, like, I don't think that checklist is all that, you know, inclusive of everything that could happen. Uh, and, and so the, it's an iterative process and it's a,
It's a process that you learn from and you do a little bit better next time. Hopefully the person next to you has burned, has been burned on a couple of different things. It really does help to work with different people. And, you know, you gain a lot of knowledge that you don't pay for with bringing down production, which I have done. I've done that before.
But battle scars, battle scars. And, you know, like that's good knowledge. It's even better knowledge when somebody next to you is like, hey, I've brought down production this way before. Let's not do it again. And so there's a lot of
iteration back when I started in the industry, we used to do a lot of feature engineering, but the data sets were a lot smaller. And so like on a small data set, you can start understanding it really well, but like, how do you even go about understanding language data? Right? Like you're, you're going to have an embedding and you're going to visualize it, but guess what? Like, how are you going to train that model? You're going to train an embedding and you're
And so like when your tool for understanding the data and the tool that you're training are one in the same, essentially, like you just try it. You just try it and see what comes out of it. And I know that that's pretty wasteful of the compute. I know that there are environmental consequences. I don't tend to think of them when I train. I probably should, but
But you're trying to be compute efficient just because it's expensive, not because it's environmentally friendly. But even so, you kind of go into it with an expectation that you're not going to be right the first time you train it. You're going to need to do some work on it. And ideally, you know all of this, but you start with a small model. You start with overfitting on a small data set to see even directionally are you right. You
You do early stopping to make sure that like you're not training longer than you absolutely have to. But even with all those hacks, like the way that we do machine learning today is just a lot of data and a lot of compute. And you do some thinking, but the thinking is fairly limited compared to amount of data and compute. Yeah.
Well, we should talk about a lot of the stuff you're doing around ML careers and for people that are in the AI world. I think that you're doing a huge service to people because of the experience that you've had. I know that when we first talked, probably back in 2020, you were still at Twitter. Since then, you've gone on to work at Meta and then the health tech company. But the
Interesting thing when you were at Twitter, I think you just got the job and you were like, yeah, I'm thinking about trying to take that next step into becoming a staff machine learning engineer. And for me, I thought that was fascinating just because it was like, oh, OK. And so what do you need to do there? You're like, well, I think I need to give more talks and be out in the community more.
Turns out that wasn't true. Or was it directionally correct? Because first of all, you didn't give any talks. I kept bugging you to give a talk in the community. And then you were like, yeah, yeah, later, later, later. And you ended up getting staff engineer anyway. So what happened? Well, now I can give talks as a staff engineer. No, I think slightly different companies do it differently, right? And so Twitter was particularly quite...
they put a lot of emphasis on like open sourcing stuff and, and publishing within Twitter. Uh, unfortunately, you know, I, I'm sorry, it might be my fault. Twitter died while I was there. Um, and I think it has something to do with change in ownership, but no, but you were there when Elon took over. No. So I left a little bit before I was there for the entire drama. And I left like a month before because it was pretty clear where it was going. Um, so, uh,
So, yeah, Twitter was a terrific place to work, honestly, and I don't think it is today. But different companies price different things, right? And so there's more than one way to staff. And at staff, you really start seeing the archetypes.
And different people get staffed for different reasons. Some people get to staff because they're really broad. Like you're always going to have to be T-shaped rather than broad. You have to be broad, but like really deep in something. But some people like me get to staff because we can understand a lot of different things across the ML model lifecycle.
And other people get there because they're an ultra specialist on text embeddings after it rains and, you know, in some sort of... The Venn diagram, it's like there's a lot of different pieces that they're just in the center of. Yeah. Like, wow. Yeah. And staff engineers are leaders. And so like a lot of what you need to demonstrate is that you can...
produce at a high level over time. And so people get frustrated. They're like, oh, I've done this for a year. Like, when am I going to get staff? And I'm like, you know, your time horizon on staff is like three years for anything you plan. And so just because you could perform at that level for a year, like I'm not quite convinced yet sometimes. Wow. So some of it was just timing, right, for me that I still needed to wait. And honestly, like I didn't get staff at Twitter. I got staff when I went over to Meta. And part of it was,
you know, my interview performance and part of it was all the experiences that I've had before. But I know that like Google, for example, does prize talks and being visible in the community in some way for their staff and especially senior staff. Everybody I know who's senior staff at Google has done at least some conferences, said at least something that's worthwhile. And so, yeah, I mean, I think there's,
The problem with this is that there's more than one path. And in ML in particular...
I think there are a lot of folks who do a really good job like in MLOps community, talking about MLOps there. There's a ton of good material for software engineers on how to like advance through your career, but there's not a ton of material for MLEs and what material there is usually comes from really well-meaning folks, but like folks who haven't seen the higher ends of the ladder and
And it changes by the time you're here. But one of the things I'm working on is next year, I want to invite a lot of folks who are more senior than me and have a podcast with them about specifically about their career. How do they get to where they are? And kind of ask them really targeted questions. Like I'm trying to get a former director for Meta that I worked with
to come and talk about like, what's it like to be a director? Like what are the challenges that happen there that you don't see until you get to that level, right? And yeah, and so like a lot of folks who are a lot smarter about it than me. And to try to understand that area a lot better. In the meantime, I do have a small YouTube channel, which right now is about helping you get the next ML job.
I really wanted it to be broader. And so I did like a video on mentorship and it tanked. Like nobody watched it. Like to this day, it has like 200 watches and like my other videos have like 14,000, right? And what I understood was that nobody is talking about ML career path at this level. And so people don't think about it until they have to look for a job. And then they have to look for a job and they're like, where do I go? And so...
And so I started the channel with a little bit more content, with content a little bit more heavily leaned toward like, how do you interview? How do you get this next job? And it's kind of,
in my mind, understood that at some point it'll be a bait and switch where I'm going to say, okay, now that you've got the job, here's how you actually do it. And here's how you get promoted. And here's how you make sure you don't ruin the world while you're doing this. So that's kind of the direction of it. But hopefully it'll be gradual enough to where people...
won't unsubscribe yeah no they'll go with you on the journey because they probably are in the job and going uh-oh yeah now i gotta find a new youtuber to watch videos of who will tell me how to do this i might as well just yeah just keep watching me you know you don't need a youtuber but but there are things about our profession that are different for example like the shifting sands that we talked about right like yeah this is so my master's thesis it was about uh computer vision
And it was in 20. So like I started it in 2011. Um,
I finished it in 2013 and it was like hard cascades, right? Because nobody has heard of a CNN by then. And by the time like I got it out in 2013, it was like, dude, like you should have done it with CNNs. And I was like, I know I'm, I'm not, I'm not going to do another two years though. Like I'm done, but, uh, but I know I should have done it with CNNs. And so it's the only field I know of where like your graduate thesis can be irrelevant by the time you're done writing it. Uh, and, uh,
And like the new kid graduating tomorrow probably knows more about transformer models than I do having 10 years of experience in the field because I have to maintain the models that are there today. And they had like all the time in the world to go learn about the latest and greatest. And so how do you, how do you even have a career in that? Right. Like how do you stay multiple decades doing that kind of thing where you're
Transformers weren't a thing before 2017. Like nobody, nobody really understood that this was useful. Like a couple of people did, but they were crazy. And, and it wasn't until you could bring a lot of scale to this that you even see this because I saw early papers on attention mechanism. And I was like, yeah, like it's interesting. Like we have LSTMs.
Like, I'm not sure how this is better than an LSTM. I literally said that. So please don't take, you know, future advice from me. But I also sold 500 bitcoins at $1 each. No. Yeah. Oh, man. Well, I'm glad it still worked out for you. Yeah. You had a pretty successful career. You didn't need those. Oh, I did. I did.
stupid taxes, they call it, you know, we all got to do that kind of stuff. And you're just continuing to kill it. So I love seeing that. And I love like what you're talking about is very few people in the world actually talk about. And I think you mentioned it to me last time we chatted, probably where you were saying
The folks that are in these positions, they're so busy doing their jobs. They usually can't talk about what it means to be in these positions and how you get to these positions. Like you said, everybody's journey is different into them. And then the other thing is nine times out of 10, when your staff MLE,
You're at a large company and you've probably got some or a lot of NDAs that you signed. So it's not like you can just go out there and preach from the rooftops. Yeah. Yeah. And I think I don't know if it's intentional gatekeeping, but there is a little bit of gatekeeping happening as well, which is weird to me. Right. Because like anybody who came up in this industry when I came up in this industry, I
Part of the fun was that everybody makes everything open source and you can go in and you can like, oh, buy Torch. Like I can understand that. And Meta still operates similarly. Like that's why Lama is there. And it's like, okay, like we still can get benefit. We still can get ours, but like, let's make sure that people can use these kinds of techniques and like models and whatever. But a lot of other companies have started being a lot more closed off. You know, notably open AI. Yeah.
I like the emphasis on the open part. I think both parts of their name are lion backed. But, you know, closed ML is what they should be called. That should be their real name. The other piece that I think I wanted to talk about is how
What should you know as a staff engineer, like a staff MLE? What are things that you deal with and how is that different than the, you mentioned how big of a gap it is from, what is it like L5 to L6 is a really hard gap to jump. And first of all, it's probably worth just clarifying staff titles are
are nine times out of 10 at large companies, right? And like tech forward companies. I don't see a lot of staff titles at either companies with 200 people or like enterprises. I don't really see like a bank. I don't think I see many staff titles there. So maybe you can demystify that part for me too. I think in a bank, if you make that much money, you have to be a VP. Um,
I think there are legal reasons why you have to be a VP. Interesting. Banks have very weird titles. There's title inflation going on too everywhere. Sometimes you look at a small company and you're like, oh, that's a principal engineer. They would convert to a FANG-type company and they'll be senior maybe, maybe mid-level. One of my peers at Meta
was a VP, like an actual VP with like an org at a bank before. And I'm like, we're at the same level? Like, this seems wrong. But that's how it works. So like when I talk about staff, I talk about specifically like tech companies, right? Like, and specifically bigger tech companies, right?
Not necessarily just saying like Uber, you know, Microsoft, like all of those companies as well. But yeah, tech first companies. Yeah, tech first companies. I think that's a good way to put it. And I think, again, it depends a lot on the archetype. My archetype is a technical lead. So for me, what it was is a...
Don't do what I did. For me, what it was is by day you're a technical lead. You're basically helping everybody on your team to succeed. And at one point I had almost 30 people that I was a tech lead for. And at those levels, that's closer to like a senior staff level. But yeah,
at Meta, you're never at the level that you're at. You're usually performing at the next level. So everybody is one up from what they tell you. But, but so by day you're doing that and you're making sure that there's right scope on the team, that the projects are going in the right direction. And it's weird because you usually have more information. I've dealt with a lot of like really sensitive information, like the,
the way that we're approaching a lot of new regulations on privacy and ads. Like I had more context on that than a lot of people on my team. And so like I had to guide them without like
doing things that would be illegal. Like telling them what they're actually working towards sometimes. And so that's a full-time job. But in addition to that, at Meta you're expected to contribute still. And so I had projects that were quite a bit more challenging, but maybe a little bit less time-sensitive sometimes.
um that's the thing that i learned at adobe because at adobe i was working on a really big project and i was a technical lead on it coordinated like 80 people around it uh and uh
And I, being young and naive, I was like, I'm going to take the biggest part of this and I'm going to implement that. And I became a bottleneck so quickly because I, you know, I needed to manage all the pieces and I needed to ride the biggest piece. And so like staff engineers, once you get burned by this a couple of times, you start taking on pieces that are a little bit on the side, or if somebody gets really stuck,
um, you, you're there to unstuck them. Oftentimes like zero to one projects, you're like, I'm not sure how this is going to work. So like, let me take the first stab at it. And then when I'm pretty sure that we're on the right direction, I'll hand it off. Uh, there are lots of handoffs. Like I, at Meta, you onboard onto a new project every couple of months, uh, because you know, you, you need to move it forward. Usually like,
You come in as a me because something is wrong or like nobody knows what they're doing here yet. And so you go in and you move it. And once it starts moving, the idea is that your time is better spent on another project that's not moving right now. So you can hand it off to an E5 and say, I can tell you exactly what needs to happen here in the next couple of months. So you can go get started, but I need to go off to a new project. So yeah,
there's some amount of stress here, you know, and my advice for folks, honestly, oftentimes people look at the levels and it's like numerical. So it's easy to measure your progress. It's like, oh, I'm a six. I want to be a seven, right? But I'm like, do you? Do you really? Because at bigger tech companies that have gotten their compensation straight, which not everybody does, but the companies that have like understood what the compensation is, you can make more money as a good E5 than a crappy E6.
Because you're going to get better refreshers. You're going to get more bonus. And so like in the long term, a good E5 at meta is going to make more than a crappy E6. So if you're going to go to an E6 to be a good E6, then yes. But like understand that that's a significantly more work and that's significantly more responsibility. And so when people are just like, you know, like I need the money or whatever, I'm like,
So like go ace your job. Like you, you will get rewarded for this. Uh, you will get a better rating, which will equate to more refreshers, which will equate to bigger bonus, you know? Um, but, uh,
yeah, don't do it just for the numbers. Do it because like whatever the responsibility is of the next level is what you actually want to do. And for me, it was like for me, you know, I quit my job in tech. I'm currently unemployed, sorry, self-employed. And what I'm doing now with my time full time is I'm guiding ML engineers through their careers. Like I do a lot of one-on-one still and I'm trying to scale that through the YouTube channel, the podcast, whatever.
But that's the kind of work I would do if like nobody was paying me as evidenced by the fact that I'm currently doing it with very few people paying me for it. So to me, like I had to get there. That's, that was the conversation we had in 2020. I was like, no, this is, you don't understand. This is what I want to do. Right. Uh, if that's not what you want to do, I met a guy at NeurIPS, really smart guy has been in the industry since the nineties, knows the insides and outsides, uh, works on Gemma, uh, at Google, uh,
And he's an E5. And I'm like, do you want to move up? He's like, why? He's like, I got everything I want. Like, what else? Like, what am I going to get as an E6 or L6 at Google that I don't get as an L5? And I'm like, you're absolutely, like, that is a very smart way to look at your career because up to E5, you do have to move up. So E5 is a senior engineer at these companies. And so, like, see,
Senior engineer is somebody who can work independently and you can give them a project and say, here's the first couple of things, but you go figure out the rest. And we're trying to work with colleagues that are like that because otherwise we're going to get burnt out and die. So up to E5 it is at Meta and Google and all the companies, it is really up or out, right? Like if you can't get promoted from an E4 to an E5 in a certain amount of time,
You're probably not going to stay in that company for very long. I think you get like five halves or something like that. So two and a half years, you got to figure out how to be an E5. But E5 to an E6 is completely optional. E6 to an E7 is completely optional. We had two E8s in the meta ads org, which is a huge org that generates all the money. We did not have an E9 company.
And, and, and by the way, the way that you interview for an E9 is Mark Zuckerberg picks up the phone and calls you and says, Hey, will you come work for us? If you say yes, you've cleared the interview. You know, so, so that's how you interview for an E9. So like, just understand that like it gets exponentially harder with every level. E5 is the expectation. Everything beyond that is like, what do you want to do with your career? Yeah.
What can you expect from E6 to E7? Because you mentioned how you're mentoring all these folks or you're leading bigger projects and you're helping unstick projects. And that's like your sweet spot.
I didn't understand if that was your specific archetype or if that's everyone is expected to do that. Yeah, that's my specific archetype. There was another E7 at Meta that I worked with quite closely who basically just owned all of the revenue. And so whenever there was any issue, any like outage or whatever, RJ would come in and fix it.
or like coordinate a team to fix it in real time. And so like that's the amount of responsibility, right? Like all of revenue, like no biggie, just a couple of billion dollars between friends. And we had E7s who like were really deep on what we call signal loss, which is like there's a lot of regulation. There are a lot of like,
things coming out of Apple, for example, right? Where they're like, okay, like you can't use our data for ads anymore or whatever. And so when something like that happens, it's probably an E8 that starts that, that's like, okay,
this is going to impact the entire company. Let's understand what this is going to be and kind of start setting up initiatives for the next five years. And then E8s delegate to E7s like, okay, this is a particular stream of work we need to get done. You go find the teams that you need for this and you go like coordinate them however you want. But it's basically less and less defined the higher up you go, what you do, where like,
and E9 just kind of walks around and says, Oh, you know what? I bet I can get us 1% more revenue tomorrow if I do that. Uh, and I talked to the guy that I really want to have on my podcast, uh, but hopefully we'll come. Uh, but, uh,
I talked to him about the difference between like an E6 and an E9. And then, and I was like, you know, as an E6, like I got my teams to, uh, increase meta revenue by 1%. In fact, that was, uh, Q2 of like 23, uh, or Q1 of 23, uh, where our work basically showed up on the earnings report. We exceeded by 1% and like my team has increased the revenue by 1%. And so I was like, huh, that's interesting. That's us. Uh,
But, but, uh, and his response was like, an E9 does that by themselves. So like you do that with a team of people and E9 does that by themselves. And I was like, oh, okay. But that is different. So, yeah.
Yeah, this is all fascinating to me, especially because I lived and played in startup life for my whole career. And the large enterprise is so foreign, especially these tech forward enterprises that are very sought after jobs. And there's a lot of people that want to be in these jobs because of their, I think, mainly because of the earning potential. Yeah, the money is incredible. Yeah, I don't think anybody would make as much money as I made as an E6 at Meta.
Nobody's like, my life calling is to optimize the privacy ad spins or whatever the hell you were doing when you were there. Well, I can see that, right? Like, I lasted a little bit over a year. An average 10 years, like two years. You know, at Meta, it's a little bit higher, but at Meta ads, it's a little bit lower. But, like, it's a lot of churn and burn. There's a lot of... Like, don't get me wrong. Like, I...
I'm glad that I had that experience and I think it was a good step in my career and I'm glad that I did it. But it's there are some people who stay there for a long time, but that's not the majority opinion. And and a lot of us just like.
churn through one of those and then go to another big tech company and then come back and then, you know, but, but yeah, it's, it's really hard to be in place. But I imagine in startups, it's pretty hard to like, I've,
I've worked at startups a little bit and it's like, it's like a different company, right? Like when it grows, if it fails, then you have a problem of like, I don't want to work for a continuously failing company. And if it succeeds, then it becomes a very different company. It goes from like 20 people to 600 people and suddenly everything is different. Yeah. You have HR. Yeah.
Yeah. And yeah, you have to navigate politics and you have to navigate the company. Whereas before you could just say, I want to do this, like you were saying with the E9 and you pretty much have carte blanche to do it. Yeah. Yeah. A lot of distinguished engineers actually do come from like early startup engineers, like the founding engineer or the CTO of a startup. Lots of them often make
those like distinguished positions at Adobe about the only way that I know to be distinguished was to be acquired and like you were the CTO of the company that was acquired and but they're like they're vanishingly few of them I think at Adobe we had 20 when I was there you know like it's under 100 at those levels but you know I don't know if Guido is even well he probably is distinguished but you know if you write Python like I will let you slide
I think that's worth it. But it's like the guy at Amazon is like the guy who developed Java or something. You really have to do huge things in order to get to those positions. Like you don't apply for them. There's no job opening on meta website that says, you know, come be a distinguished engineer. And also you don't need a resume. Like by then you have to give talks.
By then, there's only one way forward. No, you're getting courted to give talks. It's one of those type deals. Also, if you are an E8 or E9 and want to come work at Meta, you basically just have to call them up. Like, it's not... You don't have to wait for a position to be open. They will always hire E8s, E9s, anybody who can perform at that level. But...
Well, sweet, man. We're going to put all your details on your YouTube channel and all the stuff that you're doing into the description so that anyone who is trying to go on this journey, they can follow along with you and learn a ton from you. The other thing that I want to mention is that you have been super kind in helping us put together this asset of what it takes to go from
senior to staff and so we're doing that in the community as something that like you said there's not a lot of information out there so i'm glad that we can work together on that and hopefully have something to see the light of day it's still early days so you never know if it's actually going to happen but i've got confidence we'll make it happen we'll make it happen that's what i love and uh
I also look forward to when you have the podcast with the ex-director that you worked with, maybe we'll just play it on this stream too so that the folks who are looking for part two and you've dangled the carrot, we will have it for them too.
Excellent. Yeah, I'll definitely keep people informed. And I am in the MLOps community Slack too. So if anybody needs to reach to me directly, you can. I sometimes take a couple of days to reply. Sorry about that. What happens when you're unworked? What did you call it? What was the phrase?
The change of frame? Self-employed, I guess. Self-employed. That's what I was supposed to say. I love that perspective shift. That is a great one. Yeah. But, I mean, it was a very intentional move. It's not like when I was in Arabs, like there were a bunch of companies who were like, oh, yeah, with your experience, we really want you. But I'm like...
I want to spend some time, maybe half a year, maybe a year, really focused on trying to help people. And maybe that's what I do for the rest of time. Maybe not. Maybe I do come back. But I do want to see how I can scale the things that I know work, right? Like I've worked with people one-on-one for a long time. I've worked with my teams for a long time. And it was really, I was helping...
principal engineer go through the interview process. And he was the first person who scheduled...
some insane number of sessions with me. Usually people do like three or four and then they basically like move on. And he was like, no, no, no. I need to pick your brain for a lot of things. And then when we were done, he was like, wait, why don't you scale this? And I'm like, I don't know how. He's like, well, go start like a channel or something. And I was like, okay. That's the other thing you learn as staff. When a principal tells you to do something, you just like, you don't second guess them too much. You understand how much smarter they are than you. You just go do it.
So that's where that came from is like people have asked me to scale this and I'm trying to, you know, and it's still early days. You'll see if you go check out my YouTube channel that there are some bells that are a little bit loud and I don't know the last bit of audio editing. Demetrius is helping me, so I'll get there. But yeah, definitely excited to be helping so many people. And like,
My biggest video right now has like 14,000 views. And a lot of them are from like unique viewers. So let's say 10,000. I don't remember the exact number. I'm like, when have I ever talked to 10,000 people? Yeah, you're making an impact. Yeah. 10,000 people is a concert. It's a small concert. I'm not Taylor Swift yet, but it's a small concert. But it's a concert, you know? And so it really is...
great opportunity in our time to be able to help so many people in such an efficient manner.