We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Ep 68: CEO of Mercor Brendan Foody on Evals Replacing Knowledge Work, AI x Hiring Today & the Future of Data Labeling

2025/6/4

Unsupervised Learning

AI Deep Dive AI Chapters Transcript

People

Brendan Foody

Topics

Brendan Foody: 我认为目前人工智能在评估文本信息方面已经非常出色，几乎达到了超人的水平。无论是面试记录、书面评估还是简历信息，AI都能进行高效的分析和评估。然而，这种能力在经济中的应用还远未普及，存在巨大的发展空间。我们公司致力于利用AI技术解决劳动力市场的信息不对称问题，将全球的人才与机会连接起来。我相信未来AI将能够自动化许多评估流程，使招聘更加高效和客观。我设想的未来是，AI能够提供对候选人工作表现的置信区间预测，减少人类在招聘过程中的干预。当然，人类的判断在某些方面仍然很重要，例如评估候选人是否具有激情和是否容易相处。因此，我认为AI和人类招聘人员应该协同工作，让人类招聘人员能够专注于与最有潜力的候选人互动，而不是浪费时间在不太可能被录用的人身上。 Brendan Foody: 我也关注到数据标注领域的快速变化。过去，模型不够完善时，可以通过众包方式进行数据标注和评估。但随着模型变得越来越好，需要更高质量的人才直接与研究人员合作，以发现模型的不足并改进其性能。因此，我认为数据标注市场将向高质量人才倾斜。总的来说，我认为AI在人才评估领域的应用前景广阔，但也需要不断关注技术发展和社会影响，以确保其能够为人类带来真正的价值。

Deep Dive

Chapters

This chapter explores the current state of AI in talent evaluation. AI excels at text-based assessments but still faces challenges in areas like evaluating 'vibe' and passion. The rapid improvement of reasoning models has significantly enhanced AI's capabilities.

AI is close to superhuman at evaluating text-based information like interview transcripts and resumes.
Reasoning models have greatly improved AI's ability to handle context and focus on relevant information.
AI still struggles with multimodal assessments, such as evaluating a candidate's 'vibe' or passion.

Shownotes Transcript

Translations:

中文

Brendan Foody is the co-founder and CEO of Mercor, a company building the infrastructure for AI native labor markets. Mercor's platform is already being used to label data, screen talent, predict performance, and evaluate both human and AI candidates. It's a really interesting company at the intersection of recruiting, evals, and core to improving foundation models. Brendan's team recently raised $100 million, and they're working with some of the most sophisticated companies in AI today.

Our conversation today hit a lot of interesting things, including what role humans will play in labor in the future. We talked about the types of data labeling that really matter to improve models going forward. Brendan reflected on Mercour's rapid ascent and some of the key decisions he made. And we also hit on where AI does and doesn't work in the hiring process today. All in all, a really interesting conversation. I think you'll really enjoy it.

Before we get to the episode, I just have one plug. So many of you have rated the show on Spotify or Apple, and thank you so much for that. If you're enjoying this, please consider leaving a rating on the show on either platform. Ratings help us grow, which help us to continue bringing on the best guests and keep the bar really high for the conversations. Now, here's Brendan Foody.

Well, thanks so much for coming on the podcast. Really appreciate it. Yeah, thank you so much for having me on. I'm a big fan, so excited to chat. Yeah, excited to have you here. I figured we'd start at like the highest level place, which is for our listeners, I'd love if you could just contextualize, like, where are we today? What's the state of AI evaluating talent? Like, what works? What doesn't? What's going on? I'm...

amazed at how good it is. Like, I think that everything that a human is able to evaluate over text, the models are close to superhuman at, whether it be the transcripts of someone's interview, the assessments that they're filling out in a written way, or even the signals on their resume. And

It's a fascinating dichotomy because so little of that has actually been distributed in the economy, right? And so there's just this like huge green field associated with doing that. And it's one of the things we're really excited about working on and building out. - Yeah, were there things that didn't work pre-reasoning models that like, you know, maybe talk about the last six months, like as these models have gotten better, what's like finally started to work for you guys?

Yeah, I remember back in end of March of 2020 or 2023 when GPT-4 came out and we were like build our first prototype of AI interviewer and nothing worked, right? It was like the model would hallucinate every two or three questions and all of that. And so it's just been riding this

incredible tailwind over time. And I think the reasoning models are just obviously the knowledge and the models improved a lot in sort of the first year. And then the reasoning models have made them much better at particularly handling a lot of context, figuring out what matters, what to focus on.

et cetera. It's been really cool. Still, there's multimodal things that the models aren't as good at just because it historically hasn't been as much of a focus of the labs and it's a lot harder to do RL with, but we're excited about that being added soon. Yeah. What are the milestones that like, Matt, you're like, I can't wait till the model can do X or Y or... Yeah, there's a handful of things. There are certain things that humans are very good at, like this like

vibe check of whether I would enjoy working with this person, whether this person is passionate and they seem really genuine about what they're saying. It's really hard, right? It's hard for the best humans, let alone models. And so I'm really excited about that and building evals out for a good chunk of it. But whenever I read through the reasoning chains of the models and try to decipher things that are eval, I'm always thinking like, wow, the

the model seems a lot more reasonable than whatever researcher on our team was creating the eval, right? And so it's...

It's really incredible how fast they've improved. And I think everyone obviously is seeing everything working in code, but we're just in the early innings of a lot of other domains that are taking off in an incredible way. And obviously it seems like a big part of what you're doing is basically coming up with evals for humans and how good they'll be at jobs. Obviously we have all these people creating AI employees now. It's like, hey, agents are going to do this. You'll have an AI agent doing this set of tasks that an employee would do.

Do you guys play into this at all? Absolutely. So, I mean, we do a huge chunk of this. Maybe giving a little bit of the backstory of the company, the reason we started is that we felt like there were incredibly talented people all around the world that weren't getting opportunities. And the primary reason is that labor markets are very fragmented and that a candidate somewhere else in the world, maybe it's remotely in the U.S. or another country, was only applying to a handful of jobs. The company in San Francisco is considering a fraction of...

percent of people because there's this like matching problem that they're solving manually. And through applying LLMs, we could solve this matching problem so that we could build this global unified labor market that every candidate applies to and every company hires from. But then we realized that there is this huge takeoff in hiring people associated with these like new knowledge work roles and evaluating LLMs.

Now, we hire all sorts of experts for the top AI labs that use our technology to help facilitate that, both for creating evals to evaluate our experts as well as to evaluate the models and all of these agents that you're discussing. Maybe for our listeners, too, on the Mercore side, you guys obviously have a bunch of uses of AI and screening candidates going through resumes.

Can you talk through some of the different use cases that you have for AI and then what the stack looks like that you guys are building on today? Yeah, I think a good heuristic is just thinking about all the things that humans would do manually, creating evals over those and seeing how we can automate them. So similar to how a human would review resume, conduct an interview, and then rank people or decide who should be hired.

We automate all of those processes with LLMs. We have evals for how accurately are we parsing the resume, how accurately are we scoring different parts of the resume, how accurately are we asking questions in an interview, evaluating that interview, and then passing that all into model context along with the references or every other kind of data that we have on a candidate to make the end prediction around how well they'll perform. Is it mostly off-the-shelf models and you're curating the evaluation and context around them?

Yeah, there's a lot of off-the-shelf models for more basic things, but particularly for the hardest problem of making the end evaluation of a candidate is where the post-training comes in. And learning from all of the data we get from our customers of who's doing well, for what reasons, how can we learn from those signals to make better predictions around who we should be hiring in the future. Have you learned anything about

anything surprising about those signals of something that the AI found where you thought, you know, maybe this isn't how I would have thought about it or our humans would have thought about it? Yeah, there's all sorts of things. I think that

One of the key benefits of AI is that it's able to just go way more in depth about like everything about a candidate. And it's able to pick up on all the small details that humans sometimes miss or like, you know, the vibe check sort of skips over because people already have their mind made up on a candidate. And so there's all sorts of like little resume signals. If people have demonstrated extreme interest in a particular area where they're just doing it

for fun, as you would anticipate, all the way to different signals of whether someone studied abroad in a country that is where they're doing the end job. They might communicate better and be more conducive in a work environment. There's lots of those little things that come up and are very specific to projects and customers. Are there certain things that you see will always be done by people? You were talking about the multimodal stuff, but I guess, how do you see...

AI and human interviewers working together versus a world where it just kind of goes all AI assessment? At a simplistic level, the hiring process involves assessing candidates and selling candidates. And the assessment part, I think, is going to soon get so good from LLMs that it'll sort of be foolish to think we know better.

Right? People just take the recommendation because it'll have proof that it is performing so much better on the eval, on the end outcome that customers care about. Where humans, I think, will still continue to play a really large roles in the selling process of this person that we're going to be working with and spending time with

And I think about it as enabling human recruiters and hiring managers to spend all of their time on the candidates they want to hire, rather than all these interviews of people that they don't end up wanting to hire. And so really, yeah, unlocking them to help people better understand the role, better understand the people that they're going to be working with and all the things that they should be excited about. Yeah, I love that. Will people start gaming the assessment? Is that something that you've seen at all? I guess...

The LLMs are picking up on certain things if you put in this keyword. - They all decided they studied abroad. - They all studied abroad in the place where they're recruiting for. - Yeah, it's why sometimes you have to be a little bit secretive about the signals, right? But yeah, I mean, we have so many things where we deal with this, as every large hiring process does.

And so I think the key is ensuring that assessments are relatively dynamic. Either like the problem that they're working on is changed frequently or that you're asking them super in-depth questions about a particular part of their background. Because there's so much in the wave talent assessment that becomes possible when the models are able to do immense preparation for an interview, right? Like when I'm like doing a first interview of an executive candidate, like everything

Maybe sometimes I'll have references on them, but most of the time I look at their LinkedIn profile for a couple of minutes. I have like some preliminary notes. But imagine if I could go like listen to a podcast that they were on, right? Go read through like blog posts that they've written, all of the papers that they might have done during their PhD and ask about those things, right? You can get way more in depth and nuanced in a way that's very hard to game. Obviously, you have these models that are pretty good at predicting how well these candidates will do. To what extent does it matter that that's explainable?

Or like these models just have like, you know, in a black box, like, yeah, this person's going to be good and this person's not. Yeah, I think it does matter that it's explainable for sort of two reasons. First is for customers to like understand and trust those claims, right? Like building trust and understanding

all the reasoning chains. And then the second is obviously making sure that the models are selecting people for the right reasons, the reasons that they, they should be considering. And so it's beneficial, but I think like the end state of the economy is probably just that, like, it'll be, you know, some sort of API or interaction where people want work done or they, they need some level of human involvement. Uh,

and just a confidence interval on how well that person will perform on the job. And there's far less of the intermediation that humans play in the process. Yeah, it's like an interim trust milestone on the way there. Exactly. No, it makes a ton of sense. And then obviously, you know, today in kind of the first or one of the areas where you have a lot of fit on the data labeling side, there's kind of these clean feedback loops of like, you know, I imagine you could even score like how accurately and you probably have multiple people looking at the same pieces of data. Talk about some of the challenges maybe in translating this to like

maybe more vague domains of human work. Totally. I mean, like venture capital. Yeah. Wait 15 years and then you get your feedback loop. Yeah. One way I think about it is like, if you have a hundred people that are all doing the same job, it's very easy to stack rank them versus if you have a hundred people doing a very different job, right? Like founders, right? Like they're all working on something that's nuanced in one way or another. It's very difficult to like pattern match

Like, what is the thing that they said or the thing that we learned that actually translated to the outcome? Because there's just like so many confounding variables in the equation. And so I think that it's going to be like relatively easy for the like larger pipelines of roles. Like if you're hiring 20 account executives, right, stack ranking all of them, learning from those signals. And then the models are starting to be able to learn from models.

these much more complicated things where everyone's working on something else. Like we're doing a ranking of a bunch of the Teal Fellows and that's a fun case. But it definitely is more challenging and relies more underlying reasoning capabilities of the models. Maybe just talk through what are some of the challenges that emerge in doing that?

Yeah, well, it's basically that oftentimes there's a lot of things that aren't in model context. And so models struggle to learn from that. And people like forget to add it to model context. So maybe it's like, I heard my like friend said this good thing about using this company's product, right? And like, or these things, um,

that might not be making their way in, making sure that all references are added, all of the interpersonal stuff that humans might pick up on. So we found that actually oftentimes just making sure the requisite data is in model context is majority of the problem. - Yeah, I guess in the future maybe we're just recording every conversation with our smart glasses. - Yeah, yeah. - Easy enough to feed into the model. - Bridgewater had it right all along. - Yeah, exactly, exactly. Is that where we're headed? Is it just gonna be Bridgewater at scale?

We'll see. I mean, I think, of course, a lot of companies will be adverse to that. And I think there will be regulatory reasons and legal reasons people don't want to do that. But I also think there's just going to be better processes for how models help get this information in context, right? Like maybe it's AI doing an exit interview of the manager and the people on the teams to help better understand what was going on because all the people have so many details in their head, right?

around this that we just need to get into the models for them to be able to make these superhuman predictions. Yeah, there are certainly more and more both founders and all kinds of people that are bringing AI to their meetings. And so I think a lot of those meetings and interactions will be recorded for AI to learn from. Totally. I think that'll be interesting. We need you to take our transcripts and stack rank us against each other. Only if I come up on top of you.

What do you think of the data labeling landscape today? How do you see the different players kind of differentiating from each other? It seemed like scale was really in a position to run away, but then now there's been a bunch of kind of new players in the landscape. How do you think about that world? Yeah, I think like

The key thing that most people don't understand in the data annotation and evaluation landscape is just the shift in the market and how dramatically different it is from what it used to be two years ago. Because when ChatGPT came out, the models weren't that good. It was easy to trip them up. They were making mistakes left and right. Even a high school student...

or like college undergrad could do a lot of completions or evals to help improve the models in this crowdsourcing fashion where they run these huge pipelines to get hundreds of thousands of pieces of SFT or RLHF, SFT being inputs, outputs, RLHF being choosing between a bunch of different like preference options like you would see in ChatGPT.

But as the models got really good, that crowdsourcing model started to break because you needed really high quality people that would work directly with researchers to help them understand why is this model doing well? Why is it not doing well? How can we create this really complicated data that helps to trip up the model and actually reflects like the real world things that we want to automate? And so our platform of finding exceptional people that you would want to work with

was perfectly positioned for that, and that we can hire these really high-quality people super quickly. And that caused us to take off and have all of the traction working with the big labs. And I think that trend will continue, and that the companies that are stuck in these super high-volume crowdsourcing pipelines

are certainly going to see a lot of churn. And it's going to be the new players that understand the direction the market is headed and lean into really high quality talent underpinning it that are going to continue taking a lot of market share. Do you think there'll always be demand for

I guess, humans in the data labeling process. There's obviously more and more that can be done with these models or big model gets really good at one task and then can train smaller model. How do you see that evolving over time? Yeah, the way I think about it is that so long as there are things in the economy that humans can do that models can't do, we will need to create

evals or RL environments so that models can learn how to do those things. So I think there's certain domains where that's just going to get solved sooner than others, right? Like within math or even many parts of code, like you don't need that much data. It's super verifiable. The models will solve those problems. But then there's other domains that are like much, much more open-ended. What makes a good founder when we're assessing them, right? Or, you know, honestly, like a large

Chunk of knowledge work domains, maybe a majority of them are these like open ended problems that are really difficult to verify and understand what good looks like. And you just need to get all of that understanding that the models don't have into the models. And that's why I expect like orders of magnitude increase in knowledge.

the human data and evaluations market over time. If I understand correctly, you guys, you know, clearly I think one of the initial arbitrages you, and what inspired the company is you have these great coders that are all around the world and like, you know, they're not getting access to some of these jobs. And obviously that ended up being really important for coding data. Um, you know, obviously you've expanded into other areas as well. Like what, you know, coding again, it's a perfect, like our all use case is probably also really perfect for evaluation. Like what have you had to change or improve as you've like gone into some of these fuzzier domains, uh,

and recruiting people in those areas. Yeah, I think that they're leaning on a lot of the heuristics of what a human would do manually is probably a good way to do it. So, for example, if you want to automate being a consultant, how do you assess consultants that can help to do that, giving them a case study? Maybe that's specific to their background. Maybe it's like maybe a silly question, but like you guys are all probably great coders. And so I imagine you know how to evaluate coders. Like if you're starting to get a doctor on the platform, like how do you even know how to

what the heuristics are for humans. Well, I think the point you're getting at is really interesting, which is that as you start to get into domains beyond like the machine learning teams capabilities, they need to have these experts. We need to have doctors that are helping us create our like doctor assessments and our evals for what makes a good doctor as, as well as, um, you know, a bunch of other domains. And similarly, it's what the researchers need to do with all of their technology, right? Like when we were all going on limb cysts, it was super easy to, uh,

you know, look at like the high school level physics and say like what problem was right or which one was slightly better. But when it's like PhD level chemistry and the researcher doesn't have a PhD in chemistry, it's really hard to understand what's going on to interpret these evals to figure out how we can improve them. And so I think that

That's the other big shift to your question earlier around evaluations is that both for assessing our talent as well as the way the researchers assess models, it's just going to be this much more collaborative process and working with people to help trip up the model and improve capabilities. I've heard you talk before about how this kind of like short-term data label contract work is like, it's kind of the perfect initial market for what you've done. And there's a massive amount of demand and it's kind of this wedge to like eventually doing just kind of like end-to-end labor markets.

I'd love to just hear you riff a little bit on what's the sequencing of the company look like from here toward that vision? Yeah, well, I wrote our secret master plan that goes over this a little bit. But the way I think about it is the reason that marketplaces are generally hard to build is that they're very network effect intensive. And so the thing that makes them defensible also makes them hard to build. And so it's important right now that we're very focused on drilling this wedge of huge amounts of demand that we have to...

expand the network effects, grow the marketplace, and focus on that right now. But then we're also starting to see a lot of demand for hiring high volumes of contractors from our existing customers at big tech companies, where they might need hundreds of data scientists or software engineers or whatever the role is for a particular domain outside of human data, which is really the exact same kind of request. It's just a little bit more of a legacy market where you'd be going up against the Accentures or Deloittes of the world historically.

And so leaning into that as like the second main focus and then expanding to all sorts of full-time hiring. But one of the key things is that like the lifetime of the business, we've been doing all of these. Like even the first year of the business had nothing to do with human data. It was just like hiring contractors for our friends and for ourselves, many of which became full-time employees.

And so it's much more continuous. And there's a lot of things that unify them and that we know that all companies want more candidates. They want to be able to hire them more quickly and they want confidence that they'll perform well. And so if we just measure those things and improve them over time, that'll position us for every stage of the business. Yeah.

Was there a moment that it was obvious to you to lean into the human data side? It was just so abundantly clear, this is where to... Yeah, I remember it was while I was still in college. So, I mean, the background of the business is I met my co-founders when we were 14 in high school. We were all the speech and debate team together. They were winning all the tournaments. I wasn't as good as them, but I was building companies. And then we started hiring people internationally at the IITs in India. We partnered with IT, Krog, Porous Code Club, and...

We were amazed that there were these smart people, as you're mentioning, that weren't getting jobs. And we felt like we could hire them to build projects. Our friends wanted to pay us to hire them. We could take a small fee. So we hustled a lot, bootstrapped that to a million dollar revenue run rate. We'd profited 80K after paying ourselves before dropping out, which I was very proud of.

But the parents still weren't satisfied with that, of course, until we fundraised money. But to your question, in August of 2023, one of our customers intro'd us to the co-founders of XAI while they were still working on the Tesla office. And he said, Mercore has these really smart engineers in India that are phenomenal at math and coding.

and then like the XAI co-founder the next day got or one of them or two of them got on a call with me and our team and we're just like really excited and then two days later they had us into the Tesla office to meet with the entire XAI co-founding team except for Elon it was right before one of their meetings with Elon we were still in college right like this is insane and we were just like

wow, why do they want what we've built so badly? And it's because there was this change that was happening so fast in the market that no one else had realized yet, right? And now, of course, we've scaled that up and are talking about it because we have a critical mass of

of the market share. But, but that was, that was the point. And then we, but they weren't ready for human data yet. And so it wasn't until call like six months later that we started working with a lot of the frontier labs and, and really scaling up the business. You could see the, see the tidal wave coming. Yeah. Yeah. I think like one thing I've realized over time in founders looking for product market fit is that,

People try to force things too much sometimes. It's like you need to just look for the signs of the market where it's like, wow, there's gold to be found and just drill after that. Because if it's hard to get an initial sale, then it's going to be hard to scale up the process. You need to rather look at what are the really strong pain points where the wealthiest companies will pay whatever it takes and just sniff those out and then lean into them.

I guess you've expanded beyond coding. Like, maybe to go back to the doctor example, because I was struck by when you were describing it, one, you know, in some senses, like,

evaling what a good doctor is, is actually what you're eventually going to bring these people to the model companies. They're going to figure out, is this the reasoning process that a good doctor would use? What are you actually doing when you're working with someone to do evals? Yeah, I think that one of the key things that humans are a lot better at right now is learning over time from the instructions, from the training, from all the feedback. And so

excuse me, looking for these proxies that people have demonstrated of like they're, you know, asking the right questions about the problem. They're going about thinking about it in the right way. They have signals in their background that indicate they've been in these high performing environments where people are obviously learning significantly over time. Um,

and all of those translating to them finding ways to trip up the model and improve capabilities. Do you guys use your own product today? And like, how does it get used in your own hiring process? Absolutely. We use it for every role except our executive roles. I mean, we still have the listing for our executive roles, but most of our executives, like I would take the first interview rather than sending them straight to the AI interview for the selling reason, not the vetting reason. Uh,

Yeah, I mean, it's extremely effective. In fact, we've found that in many cases, it's like the most predictive signal. I think one thing people underestimate hiring processes is that humans have this very strong bias towards thinking that they're right in this vibes-based assessment.

And like, I guess hiring is like the original vibe, everything, right? Definitely do not, do not suffer from that. Yeah. And it's like, it's like, let's ground everything in the performance data of who's actually doing well on the job. I remember actually like we, so we have this role we're hiring for strategic project leads.

And we used to have a human case study before the strategic project lead onsite. And the onsite is working with us for a day to see how they would do on various parts of the job and figure out who to hire. And then we switched over to fully an AI process before the onsite. And the conversion went up on the onsites. And so it's like through using the AI interviewer, just being a lot more objective about the comparisons.

having it standardized throughout everyone who's applying to the role rather than just mixed across three different interviewers, it was allowing us to have a lot better conversion. What about on the eval side? Are you guys using a bunch of people that you source for your own evals? Do you do a lot of that internally? Yeah, we use a lot of people-- or we work with a lot of people from our marketplace to create our own evals. And so it's a similar process to what we go through with our customers.

And of course, we still need the researchers involved with those people and understanding what are the reasons that the model is making mistakes? How can we create our taxonomy, have our post-training data reflect that our taxonomy and he'll climb on the eval

But it's all the same processes and people. Obviously, you talked a little bit about using multimodal capabilities to determine passion and other things. What other things are you thinking about with incorporating video and other things that are futuristic for the platform? Yeah, one thing that...

I think about a lot is what role RL will have in the timeline to improve video capabilities, in that RL is really good at these search problems. And video is just a huge amount of tokens. That's why models struggle with it. And so it's, in many ways, the search problem of how do we look for the signal where that person was really excited about the particular thing, or if they cheated on the interview, or what other things we could find in multimodal context.

And so I think a lot about how we can effectively create the right data to get the model to pay attention to those, as well as a lot of what the frontier labs are doing to improve those space capabilities. I mean, obviously, it seems like even in the course of a few years, the end labeling market changed so much. As you think two years from now, where do you think this is all going? Do you think this is actually a part of your business or in general?

in two years, is it like only the expert of the experts that are required? - I think it's a huge part. And the reason is like, I mentioned the beginning, we started the business because of this notion of labor aggregation and that it feels like the way labor is allocated in the economy is wildly inefficient and we could make that much more efficient. But a big part of that is making a bet on like, what will humans be doing in the economy in five years? - Please tell us. - Which is a huge question for everyone.

At least everything I'm seeing is leading me to believe that it's far more structurally efficient for humans to create evals over the things they don't yet know how to do or models aren't able to yet do than it is for them to, like, redundantly do that task all the time. And so I actually...

think it's highly probable that a huge chunk of knowledge work just trends towards creating evals. And it might not be the rigid context that we have right now of people working on an annotation tooling. It might be much more dynamic and then talking to an interviewer about how to solve their problem. But I think that that is going to...

be just like a huge part of the economy. And it's one thing that I think very few people are aware of yet because so many of them conflate it with what's happening in the SFT and RLHF market, where a lot of those data types just aren't as useful as they previously were and budgets for them are coming down. What do you think will be the most interesting skills for people to develop or kind of, I don't know, if you were to advise someone,

that was in school, what to study or focus on, where would you steer them? - I would definitely optimize for a fast rate of learning 'cause things are changing so quickly. It's hard to know. There's so many of these things that people didn't think the models would be good at for a long time that they just got really good at really fast.

I would say work with AI as much as possible. One thing I hear from people in our marketplace is that they love the fact that they just get to play around with these models all day. They get to think about, they get to spend hours thinking about a problem that the model's not gonna work at, not gonna be able to do, and what are the things the model is missing out on?

And they say that they build a lot of valuable skills that help them to know in their workflow as a McKinsey analyst, where should they be using AI? Where should they not be using AI, et cetera? And so I think just spending as much time with the models as possible and getting very familiar with the things that they're good at or bad in a particular domain is really helpful. But it's hard to say, like, be a software engineer, be a software engineer. Yeah. Yeah.

Yeah, it's interesting that like, you know, obviously, yeah, like your point that so many, many more of us will be spending time like training these models and like, you know, there's almost an infinite amount of things. Obviously, there's like hard skills that have like right or wrong answers, but then there's so many like just subjective things and like maybe in the future, I don't know, we get paid to just train our own individual models for us. Totally, totally, yeah. I think that'll be a big part of it. I would say one other thing is that people should focus on demands where demand is very elastic. And so an example is,

I think there's demand to build 100 or 1,000 times more software in the economy. And maybe it's not 1,000 times as many web apps, but it's more feature iteration on existing products, better ranking algorithms, whatever it is, versus other roles where demand is probably more fixed.

we only need so many accountants, right? And so much of an accounting function. And so as much as we can focus on those things that there will be vastly more demand for when we're able to increase total productivity is probably also a safe bet. Yeah, that's a great way to put it. I had a founder I was talking to the other day and he was like, for all this talk about software engineering going away, I really could use a lot more software engineers. Yeah.

I know. It's something I'm really excited about. If they made our software engineers 10 times more productive, we'd probably hire more software engineers, right? Totally. I think that there's always interesting curves around demand and how pricing will implicate it over time. I mean, obviously, I imagine when you started, there's probably a temptation that you could have built a recruiter co-pilot or partner built software for staffing agencies. You've obviously decided to go end-to-end. Was that obvious from the start? How did that kind of come about? Yeah.

I think part of the start was just shaped by... I think we had a lot of benefit of just approaching the problem from first principles because we hadn't seen how it was done. We knew the problem our friends wanted to solve is they wanted to work with a software engineer. And so we would just handle everything associated with getting the software engineer that will perform well to be working with them. But in hindsight...

I think that there's just many more businesses that will trend towards that because it doesn't make sense to build a copilot for a job that probably won't exist, at least in nearly the same way that it does. It probably makes more sense to have this end-to-end process automated in a way that it's able to learn from the feedback loops and make better predictions.

Yeah, though obviously in your case, I think you benefited from this data labor market is actually perfect for at a time of relatively nascent capabilities, you can do it kind of end-to-end, right? And I'm sure if that didn't exist, I imagine you might have had to go co-pilot for some of these other more complex roles. I think this is absolutely right, right? Because it's like if you're hiring full-time employees, then obviously, definitionally, people want to have them on their payroll. And so I think that is one thing that we were fortunate about is that our

operating model and the way that we'd structured a lot of the business was very conducive to what the demand and the shift we were seeing in the market was. Initially, it sounds like you were helping find contractors for your friends. I assume at some point you were like, this is a side project, and then at some point it became the main thing. At what point was it like, yeah, I'm actually going to build this business for the next 20 years versus this is a cool thing I'm doing at the start of college?

Well, the background is that I was always building companies in high school. I had a company that was doing pretty well, so I didn't want to go to college. And I told my parents, like, no, I'm not going to go to college. And they did not like to hear that. And so then eventually I appeased them. I applied to college, went to school. But I told them, like, I'm always going to drop out. And they didn't really believe me. They figured that it was a safe bet once I'd agreed to go to school.

And then I went to school and every semester I'd tell them- They like blocked the term Teal Fellow on your computer. Like, please don't look this up. Yeah, every semester, you know, I'd tell them the same thing. And then eventually I dropped out without really giving them a heads up or telling them, because I was like, I've been telling them for the last two years, right? You gave them a heads up. I gave them a heads up. The signs were there. A long heads up, right? Yeah.

And so I think for me, it was that like, I knew that I, I just wanted to build a company. I was like passionate about building things that have impact in the world rather than sitting through classes that didn't feel very productive. Um, and I was in many ways just finding the right thing to spend my time on. I think with my co-founders, it was starting as a side project, you know, wanting to, wanting to make sure they had the evidence to justify their parents, their decision to drop out, um,

And it's funny, part of their condition for dropping out was that we would raise money. And even though we had this business that was doing a million dollar revenue run, we'd profit at 80 after paying ourselves. It was making a lot of progress. That wasn't sufficient. The key was that we needed to raise our seed round. That's what keeps us VCs in business. Parents wanting some validation. It's the credibility step. Well, that's a good segue. You recently raised a lot of money at $100 million round. Congrats. Thank you. What is that kind of...

allow you to do now? Or how did you think about, you know, when was the right time to go raise more capital? I'm sure people want to throw money at you all the time. So like, how do you think about cutting off, when to cut off the spigot? Well, it's also interesting. The only time we went to raise money was really our seed round where we were like, okay, we need to raise money to justify dropping out. And then our series A and our series B, exactly. Our series A and our series B were both preemptive. And so our thought

process was that we wanted to keep dilution relatively low at 5% and sort of build up like a war chest so that we can invest in the product capabilities that we were talking about of like, how do we, you know, have referral incentives and all sorts of these creative consumer products that can build up the supply set of our marketplace as well as investing in more post-training data to improve our models, performance prediction capabilities. Um,

And in many ways, one of the largest blockers on our ML team is just creating more evals and more and more RL environments to improve our models, which happens to be very conducive to our business. You have a kind of customer base of a lot of foundation model companies like

What do you think happens to that landscape over time? I mean, some people are like, you know, it will consolidate to two or three, maybe we'll see more. You know, how many different players do you think we end up with and how do they ultimately differentiate? It's a very good question. I definitely am in the school of thought that OpenAI is and will continue to be a product company, not an API company. I think that so many of the API capabilities will get commoditized and it's really how you integrate with all the customers' context and that over time where they're able to generate a lot of pricing power. But I think that

the market is going to be so large that I could see each of them leading into a given segment that they're able to absorb a lot of value in. Like even if one of these labs were just go all in on building a hedge fund, I bet they could make a ridiculous amount of money, right? And so I, yeah, I think it's easy to like pattern match and say these companies are overvalued. But if you really approach

the problem of like automating knowledge work and like what that opportunity is from first principles. It's like it's hard to justify that these companies with such exceptional teams making so much progress won't be able to build really incredible businesses.

Yeah, I mean, obviously, today, it feels like there's been so much just, like, cross-domain generalization that, like, it feels like it's trended toward, like, more of a winner-take-all or the top-take-most versus, like, hey, we'll have one that's really good in this place and one that's good in that. Though I guess your hedge fund example is interesting insofar as, like, you could obviously... There's a lot more to build around the scaffolding of the model to make that work. Yeah, I mean, there are a lot of value. There's a lot of value to focus. I think that having a general API is probably not a great business for multiple companies. And so...

I think that there's going to be one player in that, likely one of the top two labs right now. Then there's going to be just a huge amount of customization that happens at the application layer for every vertical and every customer use case. Yeah. You think for a lot of those custom models that require some sophisticated labeling?

Oh, certainly. I mean, there is like so much. I mean, imagine if every trading firm could have evals over like the particular parts of their like trading analysis that were accurate conclusions versus inaccurate conclusions that translated to trade doing well or not. And like you had one of the top post-training teams that was just focused on like, how do we optimize having the right trading analysis for sort of mid-frequency faster than our human traders are able to get to it? I

I think there is a huge amount of opportunity. I'm talking to you. It feels like some trading firms optimal strategy should just be stopped trading, spend nine months just like...

laser focused on, on, uh, post-training model. Maybe I actually have been sort of surprised that a lot of the trading firms are less sophisticated in post-training than one would have anticipated. I think that part of it is just the geographic separation of all of them being in New York or having a big, good chunk of their core teams in New York versus the labs being in San Francisco. And a lot of the top researchers wanting to work on AGI rather than making money. Uh,

I think that they're going to invest vast amounts in it. There's just going to be these nine-figure, ten-figure partnerships with Frontier Labs to help customize their specific use case. What's the biggest unknown question you have in AI right now that you feel like has, "God, if I knew the answer to this, it would make big implications for how I'm running the business today?"

I think it's what you said earlier of what humans will be doing in like five or 10 years. That's such a hard question to answer. And I think about that as the mission of the company in many ways. And we have all sorts of intuitions, but the world is changing very fast. I think like,

so many jobs are going to get automated that getting a better understanding of that and like how we can help define humans new opportunities and the role that they play in the economy is one of the most important things. - Yeah, is there more stuff that we should be doing from like a policy perspective around this? Like how do you think about the role other institutions in society should play here?

Absolutely. I think that so many regulators have been very focused on things that actually aren't as close to impacting American lives and that they're focused on competition with China, which, sure, it matters, but it's a lot less close to people's day-to-day. They're focused on safety risks, which matter, but are a lot less close to people's day-to-day. I think the thing that everyone's going to start freaking out about in the next two or three years is just that there's these models that

are significantly better than them at their jobs. And we need to figure out how they're going to fit into the economy. And that's something we know will happen, right? It's not just this like low probability, high impact risk. And so I think that regulators need to be much more proactive around how we can plan for that future around how we can say like expectation management for the general public and what the world will look like in a few years.

Yeah, I guess it's just hard not knowing what we're retraining people for. Yeah, it is. Exactly. But I wish that there was a lot more conversation around that, right? And a lot more focus on what that next generation of jobs is going to look like and what guidance we should be giving to everyone as they're going through school and entering the workforce. Yeah.

Well, we always like to end our interviews with a quickfire round where we get your quick take on some overly broad questions that we stuff in at the end. And so maybe to start, we'd love, you know, what's one thing that's overhyped and one thing that's underhyped in the AI world today? Oh, good question. I think that evals are underhyped.

very significantly. Even though they're hyped, I think they're still underhyped very significantly. One of the last bastions of human capabilities. Yeah, I think the one thing that's really overhyped is like SFT to RLHF data or that like bucket of legacy data. There's companies that are literally spending billions of dollars on it that don't need to be spending or need to be spending an order of magnitude less. And that'll change. What's one thing you've changed your mind on in the AI world last year? Interesting. I,

I think my timelines for automating software engineering have gone up significantly. I used to be a little bit skeptical of hearing from researchers what their timelines are to having a really good AI software engineer that's able to write a PR that has a higher hit rate than a human. And I think now it seems clear that that's coming later this year, sometimes in the first half of next year. And that's going to be really, really cool.

Yeah. Do you think, I mean, obviously like it seems like with some of these AI improvements, you know, it's like if you talked about them, what they, what they were two years ago, you would have said, oh my God, that's going to change the world. And then they happened and it's like, okay, like that kind of adjusted things, but not like, do you feel like that's this like a wow moment where like, you know, there's just mass change in employment on the software engineering side? Or is it one of those things that will feel like some 10% change or 20% change?

Well, I think the thing that frames it is the elasticity that we were talking about of the role and that I'm less worried about the short time horizon of engineering jobs because I think giving them tools to make them more productive will just mean we build more software. But it will definitely change the nature of the role and that people that are product minded, people that understand how to do the things that models might not be as good at are have more of a comparative advantage in the market. What AI startup are you most excited about besides Mercor?

I'm really excited about OpenAI's coding capabilities, even though that's not a contrarian answer. I also think that there's going to be an immense amount of custom agents. And so there's a company I'm friends with that's sort of in stealth that I'm super excited about. All right. Well, you definitely can't share it on this podcast. When we stop recording, we'll harass you for what that is. Obviously, you know, like...

you're running a hugely impactful company. You know, let's say you were getting started today. You know, you were just beginning and building some AI app. Like, totally different category. Like, what else would you think would be fun to build right now? Or like, what else would you go spend time on? I think that I would choose a certain knowledge work vertical, probably something in finance that can be automated and build custom agents in that vertical to do so. You could build this AI trading firm.

Yeah, I would probably try to choose something that I think is more positively impactful because I think that, I think that, you know, making sure that we get to the right, like, valuation by the morning instead of the afternoon probably doesn't move the needle in the world. But, uh, yeah, I would choose something that I feel is super impactful, uh, to automate certain capabilities. But, uh,

Yeah, it's a cool world. Yeah. Well, I always want to leave the last word to you. It's been a fascinating conversation. Where can folks go to learn more about you, the work you're doing at Mercor? The mic is yours. Anywhere you want to point our listeners. Yeah, absolutely. Go to our website, mercor.com. We're hiring huge volumes of people for ourselves that are our customers or smaller volumes for ourselves, huge volumes for our customers and have all sorts of great opportunities that we would love to work with people on. Awesome.

Uh, well thanks so much. That was super fun. Yeah. Thank you so much. That was a lot of fun.

Ep 68: CEO of Mercor Brendan Foody on Evals Replacing Knowledge Work, AI x Hiring Today & the Future of Data Labeling 44:03 Share

Unsupervised Learning

Deep Dive

Shownotes Transcript

Ep 68: CEO of Mercor Brendan Foody on Evals Replacing Knowledge Work, AI x Hiring Today & the Future of Data Labeling