We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

From Job Displacement to AI Trainers, Brendan Foody on Work in the AI Age

2025/4/10

No Priors: Artificial Intelligence | Technology | Startups

AI Deep Dive AI Chapters Transcript

People

Brendan Foody

Topics

Brendan Foody: 我是Mercor的联合创始人兼CEO，我们利用大型语言模型(LLM)自动化招聘流程，比人工更有效地预测求职者在工作中的表现。我们的模型可以评估任何具有经济价值的技能，应用范围广泛，从咨询到软件工程，甚至包括爱好和视频游戏等领域。我们最初的业务与人力数据无关，但后来发现AI模型可以用于招聘各种职位的人才，包括为AI模型训练提供数据的人才。我们相信，未来所有招聘都将转向AI系统，AI模型在人才评估方面已经优于人工招聘经理。AI模型能够识别出各行各业中表现突出的“10倍人才”，这对于客户价值和公司长期经济效益至关重要。不同行业人才能力分布不同，有的行业呈幂律分布，有的则更接近正态分布。AI模型擅长处理可通过文本衡量的任何事情，但在处理多模态信号和评估人才的热情等方面仍有提升空间。AI模型在处理高容量流程方面表现出色，能够有效利用网上信息识别人才，并识别出一些隐藏的信号。评估人才的内在动力和热情非常重要。我们也在评估模型本身相对于人类的表现。许多工作岗位将很快被AI取代，这将带来严重的社会问题，我们需要解决AI取代人类工作后如何重新分配财富的问题。被AI取代的人可能会转向体力劳动或一些利基技能领域。为了保持经济价值，人们应该培养适应性和快速学习能力。AI模型学习速度最快的领域是那些可验证的领域，例如数学或代码，而对于不可验证的领域，例如对创始人的品味判断，AI模型的学习速度会慢一些。AI模型将能够通过各种方式验证信息，这将因行业而异。我总体上相信知识的迁移性，但新的领域仍然需要一定的数据来启动学习过程。评估AI模型的关键在于评估其经济价值，而不是仅仅依靠零样本测试或学术测试。创建AI模型的评估标准是自动化大多数知识工作的最大障碍。创建AI模型评估标准应该从相对同质的任务开始，例如客户支持。为那些多样化的任务创建评估标准将更具挑战性，但也更有价值。我不建议强迫孩子学习计算机科学，而更应该鼓励他们培养批判性思维和解决问题的能力。培养孩子对某一领域的热情和批判性思维比学习具体的编程技能更重要。AI模型可以识别出人才的“品味”，即他们对问题的思考方式和对细微差别的感知能力。在人才评估中，应该尽可能直接衡量目标，而不是依赖于代理指标。未来将有大量的人从事数据收集工作，数据收集工作的重要性在于创建AI模型的评估标准。数据收集工作的持续时间取决于AI模型达到超智能的快慢。除非AI模型达到超智能水平，否则人类仍然需要参与创建评估标准。知识工作可以分为解决最终任务和创建评估标准两种类型。从经济角度来看，知识工作将从重复性任务转向创建评估标准。AI模型达到超智能的快慢将决定人类创建评估标准的需求。AI模型将参与创建评估标准，但仍然需要人类专家进行验证。AI模型能够区分有价值的人类知识和无价值的人类知识。在AI模型达到超智能之前，仍然需要大量的评估标准来支持其在各个领域的应用。为高级人才创建评估标准的激励机制需要考虑其机会成本。随着技能水平的提高，高技能人才的评估标准将变得更加重要。AI模型应该能够识别出那些最难被自动化取代的任务。AI模型将能够识别出人类在经济中所扮演的角色。人们低估了AI对就业的冲击速度。客户支持和招聘等领域已经出现大规模的失业现象。人们低估了AI模型在非可验证领域创建评估标准的重要性。人们低估了人类在未来经济中的作用。许多行业已经存在某种形式的普遍基本收入（UBI）。我们需要考虑如何在AI时代重新分配经济盈余。随着AI人才评估技术的进步，企业可能会进行更多裁员。AI人才评估技术可能会面临法律挑战。AI人才评估技术在经济上是不可避免的，尽管可能会面临阻力。受监管行业与非受监管行业对AI人才评估技术的接受程度不同。受监管行业缺乏经济压力，导致其对AI人才评估技术的接受程度较低。AI模型未来可能更擅长担任管理者角色。AI模型未来可以帮助人们更好地组织和管理工作。目前的AI模型擅长解决数学问题，但在处理一些基本的个人助理工作方面仍有不足。AI模型具备构建代理系统的潜力，但仍需要进一步的工程工作。AI模型需要学习如何在不同情况下使用工具和合成信息。强化学习微调（RFT）是一种高效的模型定制方法。强化学习微调（RFT）比监督式微调更有效率。 Sarah Elad

Deep Dive

Shownotes Transcript

Translations:

中文

Hi, listeners, and welcome to No Priors. Today, we're chatting with Brendan Foody, co-founder and CEO of Merkur, the company that recruits people to train AI models. Merkur was founded in 2023 by three college dropouts and Teal fellows. Since then, they've raised $100 million, surpassed $100 million in revenue run rate, and are working with the top AI labs. Today, we're talking about where the data for foundation model training will come from next.

evaluations for state-of-the-art models, and the future of labor markets. Brendan, welcome to No Pryors. Brendan, thanks so much for doing this. Yeah, thanks for having me. Excited to be here. So you guys have had a wild last six months or so. There's huge traction in the company. Can you just talk a little bit about what Mercor does?

Yeah, so at a high level, we train models that predict how well someone will perform on a job better than a human can. So similar to how a human would review a resume, conduct an interview, and decide who to hire, we automate all of those processes with LLMs. And it's so effective. It's used by all of the top AI labs to hire thousands of people that train the next generation of models. What are the skills and job descriptions that the labs are looking for right now?

It's really everything that's economically valuable because reinforcement learning is becoming so effective that once you create evals, the models can learn them and how to, uh, you know, improve capabilities. And so for everything that we want, LMS to be good at, we need evals for those things.

And it ranges from consulting to software engineers, all the way to hobbyists and video games and everything that you can imagine under the sun. And it's really whatever capabilities you're seeing the foundation model companies invest in or even application layer companies invest in, the evals are upstream of all that.

And are you also helping companies outside of the core foundation models with a similar type of hiring or is it mainly just focused on AI models right now? Yeah. So actually, when we started the business, it was totally unrelated to human data. It was just that we saw that there were phenomenally talented people all around the world that weren't getting opportunities and we could apply LMs to make that process of finding them jobs easier.

more efficient. And then we realized after meeting a couple of customers in the market that there was just this huge vacuum because of the transition in the human data market and that the human data market used to be this crowdsourcing problem of how do you get a bunch of low and medium skilled people that are writing barely grammatically correct sentences for the early versions of ChatGPT. And it was transitioning towards this vetting problem of how do you find

some of the most capable people in the world that can work directly with researchers to push the frontier of model capabilities. But we've still kept that core DNA of hiring people for roles, human data and otherwise. And a lot of our customers hire for both. Do you think all of hiring eventually moves to these AI systems assessing people or at least all sort of knowledge work? I think certainly, because we're already seeing on most of our evals that models are better than human hiring managers assessing talent. And it's still like the very early innings.

And so I think we'll get to a point where I'll almost be irrational to not listen to the model, right? Where people trust the model's recommendation. And like maybe for legal reasons, we'll still have the human pressing the button and making the final sign off. But where we just trust the model's recommendations on who should be doing a given task or job more than we trust the humans. I guess in any field, people say that there's 10x people.

There's 10x coders who are way more productive than the average coder. There's 10x physicians or investors or you name it. Do you see that in terms of the output of your models? In other words, are you able to identify people who are outliers? Totally. This is one of the most fascinating things is that

The power law nature of knowledge work frames the importance of performance prediction. And that imagine if you can understand like the kinds of engineers on an engineering team that are going to perform in the 90th percentile, right? Or even if you could say, I know that this person that costs half as much is going to perform in the top.

quartile, right? It frames like how you think about the value that we create for customers and how you think about like the long-term economics of the business. And it all ties back to like, how do you measure the customer outcomes and really go on them? And is it a power law or what sort of distribution is it? Because people always talk about human performance as a bell curve. Do you think that's actually true or do you think that's the wrong way to interpret human performance relative to knowledge work?

It's very industry by industry, right? Like for you in investing, right? It's like the most power law thing imaginable. And where it's just like the top handful of companies each decade are the ones that matter such a disproportionate amount. And it's the investors that went in those versus if you're hiring like factory workers, right? It's a much more commoditized skill set. There is a lot less of a difference. And I think like software engineering is somewhere in between, right?

It's definitely very power law, but I don't think it's as power law as, say, like the handful of best investors in the world. Do you have a prediction for either because of the distribution of like skill level or the measurability, like where you should expect that models are better at evaluation or identification of talent beyond, you know, human data first? Yeah.

Yeah, so it's really everything that you can measure with text the models are really good at. Like if you can ask questions in an interview and read through the transcript, the models are superhuman at that. Across...

many more domains than one would think. Like it's not it's more domain agnostic than I would have initially anticipated. I think the things where models are going to be slower is on the multimodal signals and understanding like how passionate is this person about what they're working on? Right. Like how persuasive are they or good at sales?

And those capabilities will come, but they'll just take a little bit more time. So that's my mental model for thinking about it right now. Right. So like if I'm interviewing a candidate for one of our companies and they are saying the right words about, you know, motivation level, but I don't believe it, like that might be a next level signal if I have any predictive power here. Totally, totally. Exactly. The other thing is that the models are way better at high volume processes.

An example is like, say you're assessing 20 people for the same job and you hire those people, you see how they perform. It's very easy to attribute features of each person's background to how they perform, right? It's sort of the stack ranking where you can understand like this person had this nuance in their interview or this person had this nuance in their resume. And that was the thing that explained how well they performed on the job versus if those 20 people are performing 20 different jobs.

Then it's just this like mess of figuring out like what is causing what things to happen. It's way more difficult to understand like what features are actually driving signal. And so I think it'll be those higher volume processes that also get automated first. Is there anything that...

surprises you about like basically the discovered features in terms of, I don't know, any domain that you are working on today that identifies amazing talent? That's a very good question. Or maybe in engineering because it's relevant for many of our listeners. Yeah, I think that...

One of the really interesting things for engineering is that there's so much signal about a lot of the best engineers online that I don't think people properly tap into, right? It's everything ranging from their GitHub's to the personal projects on their website to the blog post that they wrote during college. It's just because there's like it's bottlenecked by manual processes.

The hiring managers don't have time to read through all this stuff, right? They don't have time to, or with designers, they don't have time to consider every proposal or images from someone's Dribbble profile before doing their top of funnel interviews. And so I think one of the things where people are under-indexing on Signal the most is the things that can be found online.

But then a lot of the things that can be indexed on during an interview, like how passionate is this person? Does this person have the skills that it would require for the job? I think humans are relatively good at. At least they're a little bit more adopted right now. Are there hidden signals for other types of domains where there's less online work? An example of that would be physicians, lawyers. There's a lot of other professions where...

Yeah, there's all sorts of these hidden signals. Like one interesting one we've seen in the past is that people who are based internationally but study abroad in a Western country tend to work much more collaboratively or communicate better with

people. And it's like they're the kinds of signals that make sense when you look backwards and evaluate them, but are hard for like a human without having full context of everything happening in the market to really understand and appreciate. And there's often like one of the most important things, as you can imagine, is just how intrinsically motivated and passionate are people about a domain. And so looking for signals of not just like on their resume and in their interviews, as well as online of like what indicates this thing.

right? And it pertains not just to who you hire, but also what those people should be working on, right? Imagine the nuance between hiring a biology PhD to work on biology problems versus hiring the person who wrote their thesis on drug discovery to write problems and come up with innovative solutions contextual to their thesis. And there's just so much inefficiency with

the way that we do matching, the way we use all those signals right now. So you're eval-ing people. Are you also doing evaluations of the models relative to the people? Yeah, yeah, of course. And then what is your view in terms of the proportion of people who eventually get displaced by these models? In other words, if you can tell the relative performance and you can look at relative output,

How do you start thinking about either displacement or augmentation or other aspects like that? I think displacement in a lot of roles is going to happen very quickly and it's going to be very painful and a large political problem. Like I think we're going to have a big populist movement around this and all the displacement that's going to happen. But one of the most important problems in the economy is figuring out

how to respond to that, right? Like, how do we figure out what everyone who's working in customer support or recruiting should be doing in a few years? How do we reallocate wealth once we have once we approach superintelligence, especially if the value and gains of that are

more of a parallel distribution. And so I spend a lot of time thinking about how that's gonna play out. And I think it's really at the heart of- - What do you think happens eventually? X percent of people get displaced from my color work. What do you think they do? - I think there's gonna be a lot more of the physical world. I think that there's also gonna be a lot that of niche skills- - What does the physical world mean?

Well, it could be everything ranging from people that are creating robotics data to people that are waiters at restaurants or are just like therapists because people want like human interaction, like whatever that looks like. I think all of I think that automation in the physical world is going to happen a

a lot slower than what's happening in the digital world just because of so many of the self-reinforcing

gains and a lot of self-improvement that can happen in the virtual world, but not physical one. Do you have a point of view on what types of skills, knowledge, reasoning are worth investing in now as a human expecting to stay economically valuable? So Sam Ullman said this thing when someone asked him this about how people should optimize for just being very versatile and able to learn quickly and change what they do.

And I think that resonates a lot because there's so many things that one would think the models aren't good at, that they get very good at very fast, that I almost think you just need to be able to navigate that quickly. What are the characteristics of those things that you think models will learn the fastest? Like if you were to say, here is a heuristic. Yeah.

What do you think of the components of that? If it's verifiable. For things like math or student code that are verifiable, they will get solved very quickly. So you want a feedback loop or utility function that you're optimizing against as a model. For things that aren't verifiable, like maybe it's your taste in a founder, right? That's much harder to automate. And it's also a very sparse signal because, yeah, there's just not that much data on it. This is a pretty fundamental research question right now. But what do you think are the most interesting ideas about verifiability beyond code and math?

Well, I think that there's ways that you can have success

certain autograders or criteria that humans can apply. And I'm very interested in, or that models can apply those criteria. And I'm very interested in how that will play out over time. And there's obviously a lot of other domains where models will take unstructured data, they'll structure it, they'll figure out how to verify it. And it's very industry by industry. I think it's going to be hard for one lab to do everything there. And there's going to be

you know, more specialization as we progress further and further and, uh, marginal gains in each industry become more challenging. How much do you believe in, uh, generalization from the code and math type reasoning and intelligence? Like, if I'm this much better at proof math, does it make me funny eventually? Me being the intelligence. Yeah, I generally believe in it. Uh,

But to a certain extent, like you still need a reasonable amount of data for the new domain and to kickstart it. But there's going to be a lot of transfer learning. I think it's very funny when Sarah does proofs. So I think it all fits. She gets proofs. I actually think being bad at proofs is funny.

Okay, let's talk about evals because you're working on the bleeding edge of model capability. There has been this whole sense of what people call evaluation crisis around like the models are so good and they're somewhat indistinguishable at the fringe of capability today that we don't know how to test them, ignoring all the issues with evaluation.

people gaming the benchmarks, right? What do you think how, like what right ideas are there about evaluating models especially as they become superhuman? Well, I think one of the most important things is that a lot of the evals historically have been for like zero shot of a model or like a test question, right? That might be academic. When the thing that we actually need to eval is like what's economically valuable?

When a software engineer goes to their job, it's so much more than writing a PR. It's coordinating with all of the relevant parties to understand what does the product manager want and how does that fit into the priorities of each team and how does that all translate to the end output of work. And so I think we're going to see an immense amount of eval creation for agents. And that is the largest barrier to automation.

automating most knowledge work in the economy. Where should people start? Like that feels not terribly generalizable. So Sierra has something called TauBench that I think people are trying and there are other efforts here, but it is perhaps like more specific to a certain function. Yeah, I think that...

People will need to have these by industry and they should probably start with tasks that are more homogenous, right? Like it's going to be for customer support tickets. I think that's a great example because there's like one interface that the customer support agent interacts with. Maybe they call a couple of tools like accessing the database or reading through the documentation, but it's a relatively like homogenous uniform task.

I think the things that are going to be more challenging, but also in many cases more valuable, are creating evals for these very, very diverse tasks, right? All the things that go into making a good software engineer. That's going to be really hard to do. I think it's going to be

a years-long build out for even some of the verifiable domains because there's so much that goes into a good software engineer of like, how do they have taste for like, you know, what is the right way to approach a problem or what are the products that people really enjoy using? And I'm really excited for that. So if you were to counsel people with young kids, say your child is, I don't know, five to ten. Yeah. Should their kids learn computer science? I would.

probably not push them towards teaching their kids computer science but I'm not totally against it I think that the key thing is I would encourage them to just like find something that's intellectually stimulating they're really passionate about where they can learn general reasoning capabilities um

And those reasoning capabilities will probably be very valuable and cross applicable. I always loved building companies growing up and hustling and doing small things like that. And I think that is something that could be helpful. But I am skeptical that

the really valuable thing is just people who can code in five years. I think it's much more likely like the people that have these contrarian ideas around what's missing in markets and have the taste of what like features and nuances need to go into solving that problem. You said taste a few times. Are there signals of taste that you feel like you can discover in any domain? Yeah, absolutely. I mean, I think that oftentimes you just want to see

see the softer signals of how people think about certain problems. And certain people have intuitions, whether it be like the way they approach a problem or if they're looking at different like products, how they notice nuances. Yeah, it's very industry. It's very contextual to the industry, but it's important to measure. How can you score it?

What's the positive feedback loop here? We've done a variety of things, but oftentimes we will give people like a problem that as closely as possible mirrors what they would solve on the job. And then we would see how they compare to other people. And so that helps with scoring it. Yeah, some further thought process is part of that. I know, for example, it's almost like looking at like code reviews or other sort of intermediate work along the way relative to something.

We definitely do. One thing I've realized about talent assessment is that a lot of people focus too much on the proxy for what they care about rather than the thing they actually care about. And so ideally, you want to measure the thing that you actually care about. So if it's that person building an MVP of the product, ideally, you have an interview that's like a scoped down version of doing that. The place where you need to use proxies is when it's like a longer horizon task where you just want to structure the proxy to get as much signal as possible. And so that's

sort of how I think about talent assessment. Yeah. Can I ask a scale of impact question? So if I think about the very largest employers today, like let's call it like low single digit millions of employees. Yeah. Right. Or I don't know anything about contractors and Amazon workers and such. But how many people do you think like will end up doing data collection?

I think it's a huge volume. I think the reason is that it all comes down to like creating evals for everything in the economy. I think part of that will be current employees of businesses that are creating evals for that business so that those agents can learn what good looks like. Part of that will be

you know, hiring out contractors through a marketplace to help build out those evals. But it would not surprise me if that becomes the most common knowledge work job in the world. How long does that last?

So effectively, people are being brought on to displace themselves. This is true. Is that a six-month cycle? Is it a two-year cycle? What is the length of time at which people have relevancy relative to some of these tasks? There's always a frontier. Unless it becomes superhuman, right? Yeah, unless it becomes superhuman. It's almost like time to superhuman. But I had an interesting conversation, which is that you don't even know that you have superintelligence without having evals for everything. Yeah.

Because it's like you sort of need to understand what is the human baseline and what is good. It's grounded in this understanding of human behavior. Yeah, a friend of mine basically believes that, you know, Nyquist theorem, which is basically if you're sampling a signal, you need to be able to sample it twice the frequency in order to be able to actually extrapolate what it is. Otherwise, you're not sampling richly enough to know. Mm-hmm.

And so he views that there's some version of that for intelligence. Like you can tell if somebody's smarter than you, but you don't know how much smarter because you aren't capable of sampling rapidly enough to understand it. And so I always wonder about that in the context of super intelligence or superhuman capabilities in terms of how smart can you actually be

since it's hard to bootstrap into the eval. Well, so I think like when you take it to the limit and you have super intelligence, what you're saying makes a lot of sense. But another way I think about it is that if we classify knowledge work in two categories, one is like solving an end task.

where it's sort of a variable cost of like you need to do that repeatedly. And the other is creating an eval to teach a model how to solve that task, which is like a fixed cost that you do one time. It does seem structurally more efficient for work to trend away from the variable cost of like doing it repeatedly towards this fixed cost of how do we build out the evals and the processes for models to do this themselves.

That said, it all comes down to how fast are we approaching superintelligence? If the models are just getting that good that fast, then sure, I don't think we would need humans creating evals very much, but I also then don't think we would need humans in many other parts of the economy. And so you sort of need to be thoughtful about the ratio of that. Does that create an asymptote in terms of how good these things get, or do they start creating their own evals over time?

I think that they'll play a role in creating their own evals. Where they might come up with certain criteria for what a good response looks like and humans validate that criteria. However, I think you often need to ground this in the experts in that particular domain. Sure. But I'm just thinking like MedPalm or something, right?

where MedPalm 2, where the output of the model was better than the average physician. It was basically like a health model that Google built. And they would use physician panels to rate outputs of the model versus individual physicians. And the model did better by far than individual physicians.

At some point, it should do better than the physician panels where feedback from the physician panel should make the model worse, right? In other words, if you just are all dead off of individual physicians, the model already was going to get worse. And so there's a little bit of this question of how much, when does human scoring...

create worse outcomes because the humans aren't as good at a task? Well, I think the models will be able to delineate between the valuable human knowledge and the human knowledge that's not valuable. And that maybe you have doctors that create like a bunch of evals for this particular task and the model realizes like, wow, like I see the mistake that the doctor made on these particular tasks, but I'm going to ignore them. And like, here are the things that seem insightful or the things that I can learn. And

the models will, yeah, use that data and value that data immensely. The other thing I'll say is that I think it's easy to look at these evals and the rate of improvement on the evals and just think we're a lot closer to superintelligence than we are. But the truth of the matter is there is a lot between being really good at SweeBench and replacing software engineering.

Right. There is like all the coordination problems we talked about. There's like so much else that goes into that. And I think that we're just going to need a lot of evals for tool use. We're going to need a lot of evals for agents. And that build out is not is going to be a lot longer than, you know, a couple of your time horizon. How do you think about incentives for all of these like expert teams?

knowledge workers because the opportunity cost for a great software engineer with like taste and architectural understanding is a great job at Mercore or another you know interesting tech company versus some of the

geo-arbitrage on just basic knowledge work does not exist in, you know, as the skill level increases over time. That's true in coding, it's true for physicians, it's true for finance people, lots of areas where you might want evals and labels.

Totally. I think that it'll definitely become more power law over time, which means that like the best people are going to, of course, make an incredible amount of money. Do you think it's more just turn up the dial on like what any piece of information is worth from the higher skilled workers?

Yeah, yeah. But you also want like the evals at the frontier of what the models can't do. And so it might be that for like a very well-scoped problem, like answering a medical question that someone has, you might need to get the like

world-class doctor that is one of the handful of people that's able to be better than the model of that very well-scoped problem. But for the broader agentic problem of how do we talk about this case in a way that the patient is receptive to? How do we then coordinate with these set of tools to help to complete the diagnosis and send whatever emails at x

time. I think for those kinds of things, I still expect that the bulk of the bell curve, people that are closer to the mean of the distribution will be able to contribute for a longer period of time. What do you think is the biggest shift that nobody's really anticipating that's coming? It could be domain specific. It could be broader. Well, so maybe I'll answer this in two parts. Because when I think about nobody, it's like

It feels like the bulk of the country is not really coming to grasp with how fast jobs will be displaced. And that just feels like a big problem, as I said before. And I think that we need to stay very proactive as a government, as an economy, etc. Are there certain areas where you're already seeing large-scale job displacement that you don't think is being reported on? It's definitely being reported on in customer support.

uh in recruiting i think one of the challenges is that a lot of this happens at economic contractions when people get more efficient get more focused on bottom line and so i think that a lot of it hasn't happened yet but it's going to happen imminently and then in terms of things that like maybe no one even in like

San Francisco is thinking about, which is another interesting part of that problem, is that these agentic evals for non-verifiable domains is under-indexed on significantly. Another thing is that people in San Francisco have a tendency to not think critically about the role humans will play in the economy because they're so focused on automating humans. And so I think that it's important to think more about that problem. One thing that

I thought about it is that ideally model should help us to figure that out over time, right? Like what are the things that people are passionate about? What motivates them? And maybe it doesn't need to be an economically valuable thing. Maybe it's just like a certain kind of project that they like working on. And I think that people aren't, um,

index enough on how humans will fit into the economy in 10 years. You know, one thing that I feel that I really misunderstood or didn't quite understand the scope of was the degree to which we effectively had different forms of UBI or universal basic income in different sectors of the economy. Government is a clear example where there's enormous waste, fraud, grift, etc. happening. Yeah.

parts of academia, if you just look at the growth of the bureaucracy relative to the actual student body or faculty, big tech, if you look at some of the size, you know, basically a lot of these things were effectively UBI. And so to some extent, one could argue that parts of our economy are already experiencing what you're saying in terms of there's

high paying jobs that may or may not be super productive on a relative basis. And so the question is, is that something that we actually embrace as a society given some of these changes in displacement? And if so, where does that economic surplus come from?

Yeah, it's interesting. I think that as we have better analytics around the value of employees, it seems intuitive that these companies will become, you know, start doing more layoffs, more cuts, etc. Do you think those evals become illegal at some point? Because it feels like that happened a little bit with certain aspects of merit or merit based testing for different disciplines or fields. It happened with the government in the 70s where they removed it as a criteria measure.

I'm just wondering if that becomes something that more generally people may not want to adopt because it exposes things or do you think it's something that is inevitable economically? There's definitely going to be pushback, but I think it's inevitably economically because it's hard to regulate and just like so

valuable to companies that they'll move towards it? I think it depends on what segments of the economy because some of these are not economically driven already. They're just not efficient as sectors but if you look at healthcare or education everybody's seen this chart that shows a bunch of industries that have some measure of output per dollar spent

And you have increasing spend on health care and education and no improved output. Yeah. And like that's happened for a long time when there's increase in productivity in many other sectors. And the answer is there's no economic pressure. Sure. It's regulated versus unregulated sectors effectively. And the regulation is what causes the divorce from economics. Yeah.

Yeah. Also, one thing that I think is very interesting is that a lot of people are in the mindset of AI being really good as an independent contributor when actually it may soon become much better at being a manager. Right. And like taking a large problem, breaking it down, figuring out how to performance manage people for how they should be doing. And this ties into your point around like, what should we do with all those unproductive employees? Right.

Because if we have like a ruthlessly rational agent that is making the decision there, it is probably going to be very different than a lot of the decisions that have been made historically. One of our companies asked recently what I would expect an assistant to do that it doesn't do today. Right. And I think the biggest thing is like, you know, if I give it enough context and some objectives that I'm trying to achieve, I'm not like a particularly organized person. I have a lot of

output, I think, all things relative. But, you know, is it like perfectly prioritized and tasked out and sequenced so I'm not bottlenecked on a particular thing? No, right? And I would absolutely expect that the assistant can do that for me. Totally. And it goes to the point earlier, right? Just tell me. Tell me what to do for the next three minutes. We have these models that are like

incredibly good at math, right? Like you give them a test and they can ace the test, but they still can't do like basic personal assistant work, right? And I think it goes to show that there's still a lot of like research and product to be built out. And like, how do we actually bridge the gap with what's economically valuable to complete that end-to-end job that like you're willing to pay a human salary for? Do you think the models are good enough for that? There's just incremental engineering work to make it better? Or do you think it's, okay, so we actually have model capabilities that

that you think would allow us to build certain types of triagentic systems versus we need like-- MARK MANDEL: That are proactive too. MARK BLYTH: Actually, maybe let me put it this way. I think with a small amount of evals for agents in various categories, the base model has all the reasoning capabilities. And the reason you still need those evals is the models need to understand when they should be using tools in certain ways. They need to understand how to synthesize information from those tools.

But it's not a reasoning problem. It's like much more this problem of like learning each company's knowledge base and like what good looks like in that role. And so there is going to be some like post-training and I'm very bullish on RFT and everything that's going to mean.

Can you say more about RFT and explain it for our audience? Yeah. So basically, everyone used to talk about fine-tuning in the context of SFT, supervised fine-tuning, where you would have inputs and outputs for a model, and the model would learn from those input-output pairs. But the main issue, and supervised fine-tuning customization never really took off because it wasn't very data efficient.

Like companies would create a few hundred and eventually try to scale it up to tens of thousands or hundreds of thousands of SFT pairs, but oftentimes wouldn't be able to get a lot of the capabilities that they were looking for. Whereas in reinforcement fine tuning, you instead define the outcome that you care about. So in Sierra's case, like I was talking with them about how they define

what a good customer support response would look like. In our case, we define what are the key things that you should identify as a characteristic of this candidate, whether it be that they're passionate during their interview, they demonstrate XYZ domain knowledge, or they worked on this side project that demonstrated that skill. And then you reward the model for identifying that. So you set the solution and then the model can

learn in that environment how to get really good at it. And the reason I'm so optimistic about it taking off is that it's like profoundly data efficient, right? And it finally makes sense to customize models at the application layer. And profoundly data efficient is actually like hundreds to thousands of examples, like some tenable number for an enterprise or a medium-sized business to think about versus like...

I don't know, a billion tokens. Yeah. Yeah. Yeah, exactly. And so it'll be very cool. I think we're going to have these agents that fill all roles that employees currently fill, working alongside employees. Human employees will help create the evals. I also think that contractors in our marketplace will play a large role in that. It will just be this huge build out of evals to create custom agents across every enterprise.

What is most important for Mercore to get done in the next year or so?

So there's two things that we focus on as a business. I think those will be most important for this year as well as for the next five years. The first is how do we get all the smartest people in the world on our platform? And that ties into the supply side of our marketplace, the marketplace network effects around similar to like an Uber or Airbnb. Because if we have the best candidates and we're able to give them job opportunities and understand what they're looking for. The second thing is predicting job performance. Are you trying to offer anything that isn't comp?

Yeah, we are. So one of the things that we realized is that the average labor marketplace has a 50 to 1 ratio of supply side relative to demand side, which means the average person that applies talks to their friend who also applied and neither of them got jobs. And it's almost just this like structural part of building labor marketplaces.

The way to actually scale up the labor marketplace to have hundreds of millions of the smartest people in the world on the platform is to build all of these free tools, such as AI mock interviews, AI career advice, shareable profiles for people, all of the things that just create the most magical experience possible for consumers and give that away for free because it's powered by this monetization engine on the other side of the business. And so that's

very significant focus for us. I interrupted you. You were going to talk about what else was important. Yeah. It's performance predictions. So we get all the data back from our customers of who's doing well for what reasons and how can we learn from all of those insights to make better predictions around who we should be hiring in the future. And that's the data flywheel that you would find in many of the most prominent companies in the world. And I think that

The marketplace network effect is the more obvious one when you look at the business. But I actually believe that the data flywheel will become more important over time based on a lot of the initial results that we're seeing. How do you view the labor markets evolving over the very long term?

Well, I think that the largest inefficiency in the labor market is fragmentation and that a candidate, wherever they are in the world, will apply to a dozen jobs and a company in San Francisco will consider a fraction of a percent of people in the world because it's all constrained by these manual processes for matching.

right, where they need to manually review every resume, conduct every interview, and decide who to hire. When you're able to solve this matching problem at the cost of software, it makes way for a global unified labor market that every candidate applies to and every company hires from. And I believe that that's not only the largest economic opportunity in the world, but also the most impactful one.

And so far as how you can find everyone the job that they're going to be passionate about and successful in. Would that include AI agents? In other words, the marketplace would be a hybrid of people and agents all competing for labor globally? I think so, because customers ultimately come with like a problem to be solved, right? And ideally, it's some coordination of how those two fit together. Given you spend all your time thinking about

how to attract high skilled candidates and determine their effectiveness. Like what advice would you have for people who are hiring in startups and scaling companies? Early on, it's hard to stress the importance of talent density. And just like there's always a tradeoff between hiring speed and hiring quality. And you should just

for those early employees, like always index on quality. Like you need to be patient and you need to make sure that people are extremely high caliber. When you're scaling up an org, you obviously don't want to drop those standards, but people need to be a lot more data driven around what are the characteristics of people that actually drive the outcomes they care about. And it feels like where a lot of problems happen is when that slips, when it's sort of like

this vibes based assessment that doesn't scale very well, where each hiring manager is doing it in a fragmented way. And it's hard to enforce those standards across the board. And so just being very disciplined around like what are your hiring goals? What are the characteristics of people that you know are actually going to achieve the business outcomes you care about? And how do you measure those things is really important.

I found that almost every great company either hires well, like what you're talking about, or fires well, which is sort of your phase two. But I think often they do that, one of those things really well early. For some reason, most people don't seem to get both right early on. I don't know why it is. I think it's almost like a founder bias or something like that. And then I feel like over time, hopefully they pivot into both. Google was a good example of a,

organization that would always hire well but couldn't fire well. It took them a really long time to clean people out. Years, like literally years. Interesting. Facebook, on the other hand, was kind of known for a more mixed early talent pool, but they were very good at removing early people who weren't performing. So I always thought that was kind of an interesting dichotomy between the two. And those were the rumors in the Valley when each company was tens or a little hundreds of people. Now, obviously, they're all very professionalized in terms of how they do both. They have their UBI. Yeah, exactly. Yeah.

So I thought that was kind of interesting. Yeah, I think it's like a just because I mostly think about like engineering hiring and go to market hiring and investor hiring. They're all professions that have like some timescale of outcomes that isn't like an hour. Right. And so I think you're always looking for a proxy of outcomes for these like longer outcome jobs. And I think there's like a really interesting question very related to evals and assessment of like,

well, what are the proxies we're going to discover for each of these roles? Because I think it's a huge shortcut in hiring well, not necessarily firing well. If you can do references, if you can do work trials with engineers, you actually know a lot in the first five days, 30 days.

of whether or not something's going to work out. Totally. And like, you know, I think we're always, I'm always looking for proxies for that. Yeah. And I think one of the crazy things about the market is that any candidate that you do a work trial with has probably done work trials with like a lot of other top companies in San Francisco. If you don't have any of the data on that,

Right. And obviously, there is like some interesting data like privacy and centralization questions of like companies want that to be their proprietary knowledge. But I think that market is going to trend towards becoming a lot more efficient over time or even the references of people. Right. Of like those that you don't hire. Theoretically, it's beneficial for the top companies to understand the reasons that, you know, other companies in different markets aren't hiring specific candidates, et cetera. What do you think companies that attempted some sort of

Like common generic evaluation, like the hired of the world in a previous generation, like got wrong. Right. Because like the theory of like, well, we should have a common application of some kind or shared assessment has existed, but not worked at scale or worked at quality. I think that LinkedIn centralizes and aggregates the very first layer of the application process.

of like, what are the things that this person has done and like, who are they connected to? The challenge historically has been that the rest of the process to facilitate a transaction has not been possible to aggregate and automate. It wasn't possible to like actually record all of these interviews and like scalably conduct interviews of everyone. It wasn't possible to like, you know, get all of this like data and analyze it properly on like, what are the things that go into causing someone to perform well? And so I think there's just this like

huge why now that's enabled by LLMs becoming so capable so quickly. That makes sense. I think one of the theories that my partner Mike has is around the scalability of LLMs being able to interrogate humans and the usefulness of that data in a bunch of different domains. And it would be great to see the aggregate of that for hiring.

So my co-founders and I are all Thiel fellows. And so we're very passionate about how we could apply a lens to help identify like the next Thiel fellows. And so like I often wonder, like imagine if you could have Peter Thiel as a heuristic interview everyone in the world when they're 18, right? And like, and maybe he could go through and like meticulously spend time determining like, you know, who is actually going to be good at what job. Like, I think we're approaching that world very quickly. It'll be fun to see how that

impacts the labor market, the investing market and everything else. That's really cool. Thanks for doing this, Brendan. Yeah, it's awesome. Thanks for having us. Thanks for coming. Find us on Twitter at NoPriorsPod. Subscribe to our YouTube channel if you want to see our faces. Follow the show on Apple Podcasts, Spotify or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no-priors.com.

From Job Displacement to AI Trainers, Brendan Foody on Work in the AI Age 41:52 Share

No Priors: Artificial Intelligence | Technology | Startups

Deep Dive

Shownotes Transcript

From Job Displacement to AI Trainers, Brendan Foody on Work in the AI Age