I remember about a year ago, one of these conversations around, are we going to have A G I that look like one of the arguments for IT. Was that, well, like at some point, the A I will get good enough to just like design chips Better than that humans can. And then I will just like eliminate one of its bottle next for like getting greater intelligence. And so IT feels we're on the path way to that in a way that .
we just want before the so we were talking about. And what are you going to do with these two more orders of magnitude? Since then, sam has told me that he actually wants to go to four orders of magnitude.
It's the worst that these models are ever going to be right now, right moment, week to week. You know, there are things that you couldn't do maybe a month ago that you can do really, really well right now. So that sounds like a pretty crazy moment in history.
Welcome back to another episode of the light cone. I'm gary. This is jarred huge and Diana. And at White combinator, we've funded companies worth more than six hundred billion dollars, and we fund hundreds of companies every single year.
So we're right there on the edge of seeing what is going to work both in startups and an A I. Recently, sam altman wrote this pretty wild s say that predicted that A G I and A S I are coming within thousands of days seeing him on monday. He actually directly estimated, you know, between four and fifteen years. Have you guys read this essay yet? And what do you think?
Yeah, I read IT. And one one of the interesting places I think we have a unique perspective is that we were we had a front row seat to the very beginnings of opening, eye opening. I basically spun out of Y, C. And so what was cool to me reading this essay is that it's literally the same ideas that sound was talking about in twenty and fifty.
And when he started open a eye, like he's been talking about this, like basically since i've known the guy, and in twenty fifteen, when he said these things, he sounded kind of like a crazy person and not that many people took seriously. And now, ten years later, IT turns out he was right. And actually, we were much closer to agi than anybody thought in twenty fifteen. And now IT doesn't sound crazy. All this sounds like totally possible.
I mean, the essay itself is pretty much the most technical optimist thing i've read in a really long time. Some of the things that he says are coming are pretty wild space colonies fixing the climate problem um your intelligence on tap being able to solve abundant energy uh yeah I think he's basically usury in this sort of start track future on the back of literally human intelligence being able to figure out all of physics.
Yes, sam was always a read like i've i've remembred back when he was starting open a eye. One of the things that really motivated him to do IT was he believed that when we actually had A G I basically be Better at doing science than humans were.
And therefore we to accelerate the rate of all scientific progress in every scientific field that that was part of the motivation from the very beginning, and I think is really connected to a one even when sam came and spoke at at our at our batch a year ago. This is long before a one was publicly released that has been worked on in, you know, in in secrecy by open an eye. That was the thing that he was most excited to talk about, was giving GPT more advanced reasoning capabilities. And I think this is the reason it's because like the thing that's missing from its ability to actually do science and like accelerate technological progress is that needs it's be able to like three things.
One thing that really going to one um in particular is if you read one of the papers talking about IT. So capabilities and potential for the future. IT talks about how IT does really want a chip design. Um and I remember about a ago one of one of these conversations around are we're going to have A G I that look like one of the arguments for IT was that well, like at some point the A I will get good enough to just like design chips Better than that humans can and then would just like eliminate one of its bottle next for like getting greater intelligence. And so IT feels like that is already kind of like we're on the path way to that in a way that we just .
want before they going to show a cool demo of .
doing exactly that is fun because we run this um hacket thon with open ize. Sam came over and judge the winners and one of the participants was actually chip design. This company is called diet computer. I think we mentioned them earlier.
What the building is basically A I designer for circle r design and the previous product IT could handle in the if you think about P, C, B design, there's four major steps, the big, expensive part that you need a lot of all of these need a lot expertise. So the system design, how do you really put together the architecture of IT? How do you design all the components like the resisters? I need the sensors, the specific um processing units.
Then you need to go to the layout schematics placing and doing the routing. And routing is known to be A M P complete problem, because as you have different layers and circumstances, there's interference. And this is why companies like NVIDIA until apple half a gazza lian electrical engineers, because this is, and becoming problem up to GPT four, which is this company had built IT actually put some constraint that was able to automate a lot of the chemic design that you, as a human, had to sign White components needed to go on the design.
And to some extend, the routing of simple would still pretty cool up to that point. So they were able to automate all that. But the thing that they demonstrated now with a one was actually able to do the system design component selection, which is crazy.
So I would be able to read all the data sets. And so like the right components, so the way the product would work because say, I want to build a world able heart right monitor with that, a illum tre and a microcontroller very high level. And given this constraint and looking at the data, abase IT would be able to match specific seller tre and microcontroller and Harry monitor sensor and connected and just output the end result.
What we are trying to build today is, uh, wearable heart three monitor, something like you would see in a woop, for example. Um the old one is amazing, but one of the downsides is that it's a bit slow. So we actually cash uh we generated like system the diagram that the one was able to generate.
It's pretty good. IT has A U S B C connector and I am you like we requested a heart rate sensor. Um and ah like this is a microcontroller, so i'm gonna show you how you can go from this.
Uh and like build P C B. So we are gonna like build the project. Um the output of this is code.
We actually use out pile which is uh uh electronics code uh language and you can see that he took all the blocks in the black diagrams, teach them together exactly how we want. Um uh the second step is that actually is going to generate a layout for the board. Um and so now like we can directly open IT and h here you go.
He here's the board is a IT is quite nice. Uh there's still like a couple of like find tuning steps required. For example. Um we could like move like this U S B type c connector slightly. Um we can like change the shape of the board, but but these are all the components um and then like thanks to the system that we build, uh we can like call the order outer on this specific board and actually get uh fully working a print circle board back.
So this is a actually one of the examples on the one paper that I would do, E D A, but actually they went a step for four, because the example on the paper they describe the E D step process is this settle tools for circuit design. IT does through the design of the schematic, also the simulation and bug gravity is easier to verify stuff than two, so like then write IT. So this company actually went a step further beyond paper, because the paper that mostly the last stages of perfection and simulation.
I guess it's an interesting example of using different models for different tasks and in different workflow. So in order to you actually pick the correct components off the bat, you know, even before you you place IT on a circular ard, you've gotto actually have probably rag on structure. You taking on structure data like P, D, F documentation and turning into a structured form that then four mini sounds like is being used to actually extract the data and then put IT into format for o one.
I think this is a very common pattern that was seeing a lot of the interesting products they are. You use different kinds of models. So yes, four minutes for p dif extraction and then o one for the reasoning because is actually very hard to select the components for parts. I know jet also work with a lot of heart tech companies and the whole part of selecting whatever certain modes of motor's the sensors is like so takes a lot of thinking for human. yeah.
The other thing I think is interesting about this temp is like during the batch, before a one came out, dio had tried to do this with GPT four o and IT just flat out, didn't work. And then they basically tried the same thing, the same prompt, but fit to a one. And boom, all is unit worked. And so there really is a sort of like stuff, functional capability of lock.
They were so excited when I talked to, and they show me that there this big miles like war, they themselves were super impressed.
This cc on that Diana, an incidentally, I think, is a really interesting concept for haakon. Like most hackelman are like people who are like building something that they planned to throw away. And the cool thing about this hackathon is he was all actual Y C funded start up that have real businesses that are funding that have like a real thing with real users. And they were all building actual features for their product that they planned to release to real users. He was, he was really cool, I think, for us to see how oh one unlock capabilities for real companies, not just like toy process.
Those other one was similar in here in terms of reasoning for a one I think I hard to work with a canfora. yes. So to tell us what can for us.
I mean the tiger line is devon for cat but based they um let you create cat designs with just natural language. You just type in making something that you want to design and if you just like spits out, make a cat design for you.
So can you design me? Five air foils optimize for fifty miles per hour, with a minimum drag to lift to fifteen at a five degree angle of attack. This very specific.
Normally this would require a asra mechanically engineer to be running all these simulations, solving through the equations. And what are you seeing? Why like flashing is like running all the multiple simulations for them at the same time.
So it's actually kind of like a copilot to a solid works.
Yeah, they actually built there, like initially they were going to build. This is a plugged into solid works, but they went for like the ever harder technical approach, which was like that this is just like a executable that runs on your desktop and IT isn't really opens up solar works for you and and .
then just starts like clicking around in the U I, pretending to be a person.
And you saw there, I was really cool earlier, they flashed that the math race. So all one was actually able to write all of these equations, all these partial differential equations, and solve basically knife stokes questions to actually solve airfo's.
That is really cool. The last episode we are talking about, you know, what are you going to do with these two more orders of magnet de? Since then, sam has a told me that he actually wants to go to four orders of magnitude to get to a trillion dollars in, uh, you know, sort of spend, I mean, pretty wild.
But on the other hand, like you can see where that might go. You know you can imagine the air foil is still it's very impressive and complex, but sort of what we're capable of doing today in twenty twenty four, you could imagine abstracting that too like understanding the nature of physics, I suppose like IT be sort of hard to see that maybe in the current version of a one. But if the scaling laws hold IT seems entirely plausible that you do far more difficult engineering chAllenges such as your room temperature fusion like these are all sort of ultimately engineering .
food mechanics. There's weather prediction. There's all these complex physical phenomenon that are very hard to solve. And you need basically P, H, D. And to sam S. A, this is a glimpse into what A, I N where oh one is heading with with this chain of thought and reasoning.
especially like the sam to say the the vives to training intelligence and this new age of intelligence, and then the o one paper just but I think this whole idea of now you can actually gave like feedback, not just unlike the output and where you you got the correct answer, but like on all of the steps to get there. And like you're basically teaching a model how to think come to guys to mention to the reasoning traces.
Will we go back and like fine team, the various steps for, like every output to make sure that the models thinking, and I want you to think that one that just is, again, very like the A G I conversations I feel like a year ago, all in this direction. Like what happens once you can actually start teaching the models to think Better. That is just like um spitting out the correct answers and then the skating laws is just a even more surface area for like throwing compute at the problem right right now. You can just basically put compute at the influence step and and iteration .
vely have something come out that you you can actually spend more money and more time and have a result that iteration vely gets Better, similar to what you might expect from a human scientific organization. Yep, maybe more consistently even day.
And do you want to talk about the architecture and how they actually created a one?
I think a lot of IT is inspired .
from what you've .
been working for many years since the beginning of OpenAI. Think one of the inspiration is a lot of the work they did with dota.
t. Remember when, like before opening, I was famous for GPT? The one thing that do is like kinds famous for that, at least people in the technology knew was dota was like winning video game competitions. That was the first big breakthrough.
On the other thing, I think, back then, thought I was in something, I took the world for storm. I mean, maybe only the research community can knew about IT, but he wasn't anything practical. But what was impressive, IT was beating a lot of the best dota players. So dota is complex game of resources and planning, right? And they implemented a lot of kind of enforcement learning type of techniques in there, which I think we're also inspired early days from alphago and off a zero as well on how a soft go IT wasn't just forcing through IT, but actually having a rework function and and trying to solve towards IT and even this why there's so just so much talk about q learning because I said the fundamental already behind the family about everything behind our so yeah so like .
because of dota, they ve got really good at doing reinforcement learning. That's how they got IT to work. They just had to play against itself like a million games. And then how does I connect to to to one?
So think this is where there is a bit of a big stepping function because how do you then incorporate that into the family of GPT type of model G, P, S, all generated based on predicting the next token and patterns and then getting those resource to check that the correct.
So I I think a lot of IT is you had to have a lot of data that was factually correct and fed into probably the model in the training and having a reward function, get IT to reason a bit more about the output and make sure that is correct. So they proudly ly done a lot of interesting techniques with that. And this really a lot of secret says on the type of one maybe one of the speculations we can do is a lot of very factual, correct information and .
and science problems and things like that and .
that's why outperforms so much in those yeah one .
of the things I think is interesting, gary, to your point about the scaling laws is a lot of people really focused on the next like scale of the model, like the GPT five series of models, which are being trained now and people are working on them and they are going to come out.
But I think people may be under appreciating how bigger unlock this other direction is because there's the two research russians being exported in parallel, right? Like one is the straightforward ale up of the underlying alarm. And then this, a one direction, is like a totally orthogonal research direction in which you unholy bled the model by having A T reinforcement learning while actually trying to do things in the real world and getting Better at them.
The version that come out so far, it's still only on one mini. And if you look at the one view, sorry, on preview and like if you look at the performance as they released, like the full o one model, which is coming out any day now, is a huge step function above even o one preview, which is what enabled all these incredible results. The haven sam is just telling us that like o two and o three are not far behind. And so like I think people may be under appreciating just how big an unlock work going to get.
Yeah, I know everyone also is really opake still. I mean, from A A sort of business perspective, this is a new method, I think, at great cost to themselves. They actually did create a new data set to train the chain of thoughts. It's essentially a giant data set of you know given task acts.
Can you break IT down and um you know break IT down into parts? And you know what's funny is this sort of rims with what jake heller figured out for case tax that if a given task that you give an allam is fluctuating or you know not consistently giving the output you want, you're trying to make that particular prompt due to many things you need to break IT down into steps. And so what's funny is jx prescription is really two parts, you know, one is break IT down in the steps, and then the other part is evil. And IT sounds like basically with o one, the chain of thoughts will replace the workflow so you might not need to break IT down in the steps yourself. But the evils are still really important, even like in the aftermath of that episode with jack heller, IT sounds like some Y C, alarms are reaching out and saying that episode helped us figure out and unlock something really big, like a lot of people really were .
just right dog ing their problems.
They got to, you have an example of the company you work with the tire. They got to a hundred percent.
Yeah, just by doing exactly what j recommended, which is like having a really big evil set and being very careful about testing every step of your reasoning pipeline.
So one of the theory that I have now is ultimately like if you super impose that on what is a mote, I mean, that's one of the questions that everyone sort of asking themselves right now. You know okay, like gp five coming, two more orders may be four more orders of magnetic de are going to come in terms of a trillion dollars spent on more training. That's pretty wild.
You know if i'm a rapper company or i'm trying to to do vertical city, or i'm trying to build my own business, what do I do? My theory would be it's the events, it's you write the ten thousand test cases. And the only way you get access to the test cases that are proper tory data that are not like commonly available is that you literally know that's what a bunch of our companies in this current way see batter doing.
They're doing the hard work of doing enter Price sales. They're getting embedded and sort of going quote and quote undercover into these are sometimes really boring jobs, sometimes really complex or arcane jobs. You know it's everything from yeah I think accounts receivable all the way over to how do you do like financial accounting or for sic account like this is all kinds of things that are really uh, not readily available. Um you can almost argue that anything that is consumer and publicly available on the internet that's going to be in the base model. So then your mote ultimately is for all of the other things that are not already online, whether it's you know for case tax, being a lawyer or maybe over here on science or in terms of building air foils, like what you're trying to find is that the data that is property in some use case, some vertical that allows you to build that ten thousand test case um eval, and then that's actually the value. I mean, this is just a crazy theory but might be what happened an interesting .
implication for stalled because everything you just said um is IT maybe worth thinking about who like of your customers picking the ones that will pay you a lot for that final like ten percent like accuracy and perfection, I think like campaign actually good example of IT where there's lots of interest in this sort of text to cat design amongst like hobby I S so people who want pose a type things and get something up and running very, very quickly.
Um but there's also like a segment of the market where is people who are literally designing like you know airplane part where there is no room or like margin for era and o one makes IT quite easy or easier now right to get to like the protest hype, like no eighty percent of the way there. But I think the strongest technical teams have the option to go all the way and go after the segment of customers who want like a hundred percent accuracy. You will pay a lot for IT.
Always, always go all the way, have to go all the way.
But I think it's interesting because why one of the things that gets pushed is does a one or does A I in general actually make IT commoditize a lot of attack and make IT less important to be a strong technical team and IT just seems unlikely to me. IT seems like .
actually the lot.
like all of the value, is probably to be captured by like the strongest technical teams who can build on top of whatever the base level of tears and keep final ten percent. Hey, great.
I think it's the promise is the eval's, and it's also like the y layer and the integrations that go rounds IT because I just the promise themselves are not a product for a company to actually doped camford, like he needs to actually integrate into their existing tools. IT needs to have a welfare through U I. And workflow and all the tooling to sort of make the prompts useful.
well. And then is distribution right? Like how do you actually get in front of people? How do you establish your brand? And then a perfectly good mode is difficulty switching, actually.
And once you have all your data, and it's working, and you're paying ten thousand or hundred thousand dollars A C V, sometimes a million to ten million A C V, uh, you know, man, it's gonna hard to switch. So all the classic motes still apply. You know, this is still software, but you can unlock this capability. And you know, this is a moment. You know.
another point to double down on the importance of eve is that that still applies on the world of a one. As founders are wondering how i'm gonna still build the best product on topper or one doesn't change and everything we discussing the episode with j caller applies because gig is this company that are hard .
to work with you. We tell about they do the full full .
backstories we found in them .
for a completely different um idea something like there an indian founding team and original there was something are helping indian indian high school students apply to U. S. Colleges very but they are .
super cracked. I I, T, A, I, engineers, researchers.
And I just say that I just happened like we were like, this is not a great idea. A is, you know, changing the world. And you look like your research, like that you've been doing at university of college, is all a line with like in particular, like fine unity in the model.
Originally, IT wasn't even the AI version of helping indian high school students supply to.
I actually is like a tassie Y C. story. These two clearly brilliant engineers. We don't like the idea at all, but we should just find them anyway and hope something works out. And the idea they actually provided to initially, which they raise the sea round for um was helping companies find tune open stores models over the kid, get to like equivalent performance ah as at the time is really OpenAI. But um what I think in general, what we found is that those have not proven to be great businesses because just like the cost of the models has gone down and was like the performance of the open sort models has gone up.
you just haven't had to find tune as much as people thought you would need .
to because the models are just keep getting Better is kind of betting on the different on the opposite direction on a let's just trust that these models gonna keep getting Better and Better, which doesn't require much fine tuning.
yep. And so they provided they again into, well, like that just find a verse like just find like a we're really good at A I now we're like word experts like finding school ezy performance of these also LED just like finding basically application for that and they went into um A I customer support, which is like competitive. But again, I just think if you're an intensely technical team, you still find ways to squeak out like at a comparative edge against other teams in the space. And I that's what they fall.
The problem with custom supporters you are dealing with very kind of squshy problems are just so many h cases is just the space of things I could go wrong as a customer. Rap is enormous. Well.
IT seems competitive. But the thing is hardly any adoption has actually happened like it's not like the world has replaced all the customer support agents with AI yet. We can all see that is going to happen, but IT hasn't happened yet. And so from that .
time point like is White open. What I what I found at least when I to the M L. Team last is that um part of the reason for the lack of adoption is that rules base systems work fairly well for most like of the simple cases and there's just not trust of belief that you can make build A I that's good enough to solve like the real messy staff.
And so most companies that were pitched on like an A I customers support agent, well like, well, you can't actually go all the way and solve like the like hardest problems that take up most of the time. And the rules base system works like totally fine for everything else. And so I remember when they were first pitching this idea. People just feel like this is just overall, we don't need a rule space system works holy fine. But they seems that is no longer the case .
yeah because they now have some really legit .
customers whose as ept or just signed up OK.
So last time I did office hours with them, they said that they automated thirty thousand tickets per day. And so they you know I think that to had more than a thousand people working on those thirty thousand tickets per day, which thirty tickets a day.
And then the interesting thing was, you know, on the one hand, is probably one of the things that, Frankly, everyone, when they think about A I there are a little bit worried, like, you know, are these jobs gna go away? And a the interesting thing about the epitope er support job is that is so not a fun job that I think the turnout over rate was something like a few months like you know most customer support agents only wanted to work there for you know six months or less. So you this actually is an interesting case of when you when something is incredibly wrote, it's literally replacing button passing like these sometimes not really actually good jobs. And um you know hopefully those people can go and do something way more awesome with their time and you know beautiful brains than wrote jobs that .
apologizing for sep tote orders that .
up exact right.
But the crazy thing they figure out with a one is that, to your point, harsh. The previous implementation before a one was GPT plus rules and all that and he would not be able to handle most of the cases. IT would have about a seventy percent air rate. Now what they did is doing the technique, like jake heller described, with really going hard core on the e balls plus one during the hack gathon, they got to only five percent era, which is does an order magnitude improvement.
The other row in this is incredible too. This is was in the complex, like the things are like very complicated that take up up to time and expense to all were like essentially like they cannot do them just like zero percent. And that's what i'm that's what they were encounter when they were selling.
This is the people. I will actually all of the the stuff of that we want to automate are these like complicated edge cases that waste lots of time and like they just they could I should do any of that. But like, now that I like fifteen percent, and that was like a one preview alone, I right, so that eighty five percent, eighty percent.
I went from zero percent accuracy to eighty five percent accuracy.
yeah. So the interesting thing here is that o one, it's not even a one yet. It's a one preview. And then um it's such a new technique that I think they are trying to protect their advantage right now. So ah you know if you use a one in ChatGPT, IT looks like IT will tell you what's really going on.
But apparently they have a fake model that just spits out things to give you the impression that is breaking IT up into steps and they've actually be hidden IT because they don't want other people to have access data yet. Um but the next step seems like IT needs to be some interpretation service, some directories and then for that to happen, you know i'd be curious o two ends up having that like you want to be able to see, okay will show me the work, show me the steps and up like that step. The third step, can we rerun this? But I want this to branch in this way or edit.
I think this is one of the things that would be the next unlock, is right now, he has the plan that IT comes out, the gentle thought, but you cannot edit. Imagine now, right now, today, oh, one, just output whatever fifteen steps to the problem that you need to solve. And imagine now being able to edit each of the steps. Then you get into the super, super find to next level of jack color.
So this is the is the worst that these models are ever going to be right now, right this moment. And know literally week to week, you know there are things that you couldn't do maybe a month ago that you could do really, really well right now. So that sounds like a pretty crazy moment in history.
So we've talking a lot about the kinds of companies and ideas that get this wave of uplift from this model improvement for a one. What are the kinds of ideas that are the opposite, that are not getting benefit as much from a one? And perhaps maybe people even should because they're getting there, might just get deprecated from the improvements as a one or two or three.
I won't go all the way and suggest they should pave IT, but I do think companies that are building A I coating agents or A I um program engineers, i'll potentially um have stop to think about here because this seems like a one in particular is like out performing on, just note, solving programing problems essentially. And I I only know some the teams I walk in the past like a lot.
What theyve invested in is like the chain of thought infrastructure behind the staff, which is now just a one doesn't is not actually like any leap forward for them. They're already like invested in that already. And so I .
think that might be a function of basically the OPEC nature of what the chain of thought is. And once you get IT to be directed, that's actually, I mean, Frankly, that's what users and code genre struggling with. Even right now, like once IT starts going down a certain path, you can't really alter things like you wanted to ask you, hey, do you want me to do IT like this or that? And you know, all of the systems are a little bit strugling. With that right now.
I was going to ask the the inverse question. Tiana, which is like each new model capability, unlocks a new set of start up ideas, like a year ago doing started ideas where like the A I agent would talk on the phone just like didn't work. We had a bunch of companies to try and all the companies did didn't work.
And over the summer, I D really started working under the trends from the the past. Batches like anything around, like phone calling, is like blowing up right now because the models finally work. So like with this new of one series of models, what are the start of ideas that like just became .
possible to connect to sam? S A, is a lot of things that are gonna the adam world, physical world Better because it's really good at math and physics. So any start of that working around mechanical engineering, electrical engineering, chemical engineering, bioengineering, all of these things are really will make our life's Better. I think really well are getting in a lock as we seen from the demos we highlighted.
IT, that's exciting. I mean, I can just be helping people click a little bit faster gotta beat things that actually create real world abundance for everyone and that IT might just be a little bit of a race like I think they're sort of the fear of A I out there in society right now and then um that sort of up to the technologies to try to usher in this age of abundance sooner rather than later.
And if we can do that, then abundance will win out over fear. So with that, I think we're out of time for this week of the light cone. We will see you guys next time.