Why are there ridges on Reese's peanut butter cups? Probably so they never slip from her hands. Could you imagine? I'd lose it. Luckily, Reese's thought about that. Wonder what else they think about. Probably chocolate and peanut butter. Hear that? Spring is back. And so is Church's Seafood. With eight-piece shrimp, surf and turf, or fish sandwich. Each starting at $3.99. Offer valid at participating locations.
You're the owner of a small business, which means you're also the tech guy and HR and personal assistant and head honcho and intern. You could use another pair of hands like the experts you'll find at Verizon Small Business Days, April 21st through 27th. Get a free tech check, special deals and more. Call 1-800-483-4428 or visit verizon.com slash small business to book your appointment. Verizon Business.
Hey, everyone. Welcome back to the Elon Musk podcast. I'm thrilled to share some exciting news with you. Over the next two weeks, we're evolving. We'll be broadening our focus to cover all the tech titans shaping our world. And with that, our show will become stage zero. You'll still get the latest insights on Elon Musk, plus so much more. So stay tuned for our official relaunch at stage zero. Come on in.
Coming soon. Now let's get into this episode. How much smarter and more useful can an AI model really get before it starts coding entire applications from scratch, fixing its own bugs and writing its own documentation?
Well, OpenAI unveiled GPT 4.1, which is a new generation of AI models it claims is faster, cheaper, and significantly more capable than anything it's released before. But beneath the upgrade numbers and benchmark scores lies something more consequential. OpenAI believes this model and its smaller variants could eventually serve as the backbone of autonomous coding agents, the kind of agents that don't just assist software engineers, they
are the software engineers. OpenAI announced the new GPT 4.1 family models on Monday, introducing not just the full-size version, but also scaled-down editions called 4.1 Mini and 4.1 Nano.
Each one is designed with a distinct balance of speed, size, cost, and power. These models are available exclusively through OpenAI's API, meaning developers integrating them into apps and tools will be the first to see how they perform in real-world environments. ChatGBT users, for now, are left out of the loop. So no prompting on OpenAI.com.
Now, what sets GPT-4.1 apart is its ability to comprehend massive inputs, up to 100 million tokens or 750,000 words, far exceeding what GPT-4.0 could process. For comparison, that's longer than war and peace and several technical manuals combined. Now, this makes it ideal for tasks requiring an understanding of complex and lengthy documents, such as legal contracts, software repositories, or
or even academic papers. It also makes it more effective in multi-turned conversations where earlier context tends to get lost. Internally, OpenAI's own testing shows GPT-4.1 outperformed GPT-4.0 model by 21% in coding-related tasks. Against the earlier GPT-4.5 research preview, GPT-4.1 showed a 27% improvement in the same category.
It isn't just about solving more problems, though. It's about solving them in a cleaner, more structured way. GPT 4.1 was specifically refined to avoid unnecessary code edits, follow precise formatting instructions, and respect the intended structure of its outputs, including correct ordering and tool usage. And developers who tested earlier models often pointed out that they had to guide the model closely, correct its structure, or deal with inconsistent formatting.
GPT-4.1, according to OpenAI, has been tuned to avoid these common frustrations. One OpenAI representative noted that front-end coding tasks, the kind that require strict adherence to format and visual consistency, were a top focus of this update. But the performance jump isn't just limited to coding. 4.1's improved ability to follow instructions make it a better choice for powering AI agents.
automated systems that perform complex tasks based on natural language commands. Now, whether it's sorting emails, organizing files, or assembling documentation from various sources, GPT 4.1 can manage more intricate tasks than it ever could before with fewer missteps.
Its capacity to comprehend longer context also means it can maintain more coherent and consistent actions over time. Now in line with OpenAI's new release, the company will phase out GPT 4.5, which was a preview model
And they're going to do that in July. And the decision seems driven by both cost and performance. GPT 4.1 offers either better or equivalent results, but with considerably lower pricing. The economic argument could be as compelling to developers as the technical upgrades. Now, cost is a core element of this launch. The full GPT 4.1 model is priced at $2 per million input tokens and $8 per million output tokens.
That's a substantial price cut compared to earlier models. The mini version drops to 40 cents per input and $1.60 for outputs. And the Nano built for speed and minimal cost is 10 cents per million inputs and 40 cents for output tokens. Now, this is the most effective, cost-efficient model of OpenAI that's ever released. However, smaller models trade some accuracy for efficiency.
GPT 4.1 Nano, for instance, prioritizes speed and affordability, which means it may not be the best option for tasks where precision is critical. Still, for developers who need fast responses for similar use cases, Nano might offer exactly the right balance. OpenAI tested the new models
And SWE Bench, a popular benchmark for software engineering tasks. The full GPT 4.1 model scored between 52 and 54.6%. And SWE Bench verified, a human-validated subset of the benchmark. That's slightly behind Google's Gemini 2.5 Pro, which hit 63.8%. In Anthropix Claude's 3.7, which reached 62.3%.
OpenAI noted that some solutions weren't runnable on their infrastructure, creating variance in scores. Now, the release comes amid intensified competition from other AI developers. Google, Anthropic and China based DeepSeek are all chasing similar goals, building models that can perform complex coding tasks by themselves and eventually take over large chunks of software engineering workflows, which means software engineers
will be laid off or fired or find new jobs. Google's Gemini 2.5 pro and Claude 3.7 sonnet. It both scored well on public benchmarks and include their own long context.
Now, the future of developers is getting a bit more tangible. Instead of having to stitch together multiple tools or tweak outputs by hand, they can begin to rely more heavily on models that understand their intentions, follow instructions precisely, and produce code that's ready to go to production. Now, this could all dramatically change how software is developed.
and who gets to develop it. Now, if coding agents do become capable enough to handle large projects autonomously, the role of human developers could shift from creators to supervisors and then just an idea generator. It's not a loss though for these developers. If you have ideas, it's a change in focus. It means more people could build useful software without needing deep engineering experience.
But GPT 4.1 isn't perfect, and it's not the end of this journey. But it marks a clear improvement over earlier models in areas that matter most to developers. Cost, reliability, instruction following, and code performance. And for now, it's just a smarter tool. In the near future, it could be the foundation of all code being developed.
Now 4.1 is faster, cheaper and more precise. Pushing AI coding tools another step closer to building software all by themselves. Someday you'll have an idea. You'll be able to write it into a prompt. Write me a software that does XYZ. ChatGPT will create the whole software from start to finish. Backend, frontend, database, everything in between. That day will come soon.
And hopefully I'll be around for it because I want to see that happen. My job for the last 20 years has been front end web developer, and I'm excited about the future of GPT 4.1. It's going to be a wild, wild ride. Workday starting to sound the same. Find something that sounds better for your career on LinkedIn.
With LinkedIn Job Collections, you can browse curated collections by relevant industries and benefits, like FlexPTO or hybrid workplaces, so you can find the right job for you. Get started at linkedin.com slash jobs. Finding where you fit. LinkedIn knows how.
The PC gave us computing power at home, the internet connected us, and mobile let us do it pretty much anywhere. Now generative AI lets us communicate with technology in our own language, using our own senses. But figuring it all out when you're living through it is a totally different story. Welcome to Leading the Shift.
a new podcast from Microsoft Azure. I'm your host, Susan Etlinger. In each episode, leaders will share what they're learning to help you navigate all this change with confidence. Please join us. Listen and subscribe wherever you get your podcasts.
Get almost, almost anything delivered with Uber Eats.
Order now. Alcohol in select markets. Product availability may vary by Regency app for details.
Hey, thank you so much for listening today. I really do appreciate your support. If you could take a second and hit the subscribe or the follow button on whatever podcast platform that you're listening on right now, I'd greatly appreciate it. It helps out the show tremendously and you'll never miss an episode. And each episode is about 10 minutes or less to get you caught up quickly. And please, if you want to support the show even more, go to patreon.com slash stage zero.
And please take care of yourselves and each other. And I'll see you tomorrow.