Gemini 2.0 features native image and multilingual audio generation, intelligent tool use, and the ability to accept streaming video as input. It can interface with Google products, execute code, and handle real-time interactions.
Gemini 2.0 Flash is faster and more powerful, offering significant improvements in coding and image analysis while maintaining cost and performance efficiency. Google is confident it will be the best model for most tasks.
The three agents are Project Astra (a universal AI assistant), Jules (a coding assistant), and Project Mariner (a web browsing assistant). Astra can handle complex conversations and access real-time information, Jules assists with coding tasks, and Mariner can control web browsing activities.
Mariner can take control of the Chrome browser, clicking buttons, filling out forms, and navigating the web like a human. It represents a new UX paradigm shift, allowing agents to behave more like users.
The new mode is called 'deep research.' It responds to prompts with a multi-step research plan, searches for and compiles information, and generates detailed reports with citations, saving users hours of time.
The Trillium AI chip offers a 4x improvement in training performance and a 2.5x improvement in training performance per dollar, with significant reductions in energy use. It is used for both training and inference.
Google's breakout AI product hit, Notebook LM, with its podcast summarization feature, helped regain narrative momentum. The recent Gemini 2.0 announcement further solidified its position, showing a return to form and leadership in AI.
Google drops a slew of new AI features showing just how far the company's AI strategy has come this year. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes.
Quick note, friends, before we dive in today, this episode was caught up in the travel dragnet. And so once again, I am doing just a main episode. I think that probably on Friday, we will do an extended news episode to try to catch up on all the headlines that we missed. A little bobbly to end the year, but we are making it happen. And at least you are not missing episodes.
So what we are talking about today is an absolute slate of new announcements from Google. It is very clear that they were not content letting OpenAI have all of its fun with its 12 days of OpenAI or shipments or whatever they were calling it, and really wanted to come in and steal some of that thunder. We're going to talk first about what was actually announced. And then towards the end of the episode, I'm going to spend a little bit of time talking about what it all reflects in terms of where Google sits heading into 2025 vis-a-vis this AI race.
As I said, there was a ton that was announced, so it's going to take a minute to get through it all. The big banner headline was that this was Gemini 2.0. Almost exactly one year after their original frontier model, a model which at the time was trying to capture energy and attention as the first natively multimodal model, it's very clear where their heads are at when it comes to Gemini 2.0. It's right there in the subtitle of the blog post, Our New AI Model for the Agentic Era.
So what's actually in Gemini 2.0? First of all, it has native image and multilingual audio generation. It also features what Google are calling native intelligent tool use, meaning it can directly interface with Google products like search and even execute code. It also is the first model to accept streaming video as an input. And so when you take it all together, Google now has a model that can view something in real time, hold the conversations and take actions in the background. This release centered around improvements to Gemini Flash.
which is the version of the model that's designed to be fast and cheap. The first generation of Flash was text only, but it is now fully multimodal and has all the features of the larger models. That means it can accept images, videos, and audio as inputs alongside text and produced audio responses.
Tulsi Doshi, the head of product for Gemini, said, We know Flash is extremely popular with developers for its balance of speed and performance. And with 2.0 Flash, it's just as fast as ever, but now it's even more powerful. Based on Google's benchmarking, Gemini 2.0 Flash is significantly improved in areas like coding and image analysis over the Gemini 1.5 Pro. Google is in fact so confident that Flash will be the best model for most jobs that it's replacing Pro as the flagship model in the lineup.
Demis Hassabis, the CEO of Google DeepMind said, Effectively, it's as good as the current pro model is, so you can think of it as one whole tier better for the same cost efficiency and performance efficiency and speed. We're really happy with that.
The audio generation feature, which is new to Flash, was described as steerable and customizable. It features eight different voices, which are optimized across a range of languages and accents. Doshi said, The response to this was pretty good. Dan Mack on Twitter writes,
I kind of hate when AI influencers try to engagement bait by saying this is insane, but I must say this is in fact insane. Google beat OpenAI to the punch by allowing real-time video and audio interaction on your desktop with Gemini 2.0 Flash. This is for sure a new era of the AI age. And while a massive update to the Foundation model is a big deal, even they pointed out this is all about the agentic era. And so perhaps unsurprisingly, Google showcased three prototype agents built on the new model.
The first is Project Astra, an updated version of their universal AI assistant. The assistant is now fully speech-to-speech. Google demonstrated its ability to keep up with complex conversations, transition between different languages, and access other Google tools. The assistant can now access real-time information through Google Search, Maps, and Lens, which is a feature we haven't seen from an AI assistant to date. Astra now has 10 minutes of in-session memory and can recall conversations you've had in the past to enhance personalization.
The second agent is a coding assistant called Jules. And Jules demonstrates what happens when you combine reasoning models with agentic capabilities. Jules can create multi-step plans to address issues, modify multiple files, and prepare pull requests for Python and JavaScript coding tasks and GitHub workflows. And if this agent is what's behind the announcement last quarter that more than a quarter of all code created at Google is now generated by AI, then we could be in for something great.
Google has designed Jules with a lot of human in the loop. Frankly, likely more than they need in order to ensure safety. Jules will present a suggested plan before taking action. Users can monitor progress and permission is requested before merging any changes. Jaisalyn Konzelman, the Director of Product Management at Google Labs said, We're early in our understanding of the full capabilities of AI agents for computer use. Jules is only available to a select group of trusted testers at the moment, but will be rolled out more broadly early next year.
A third agent is the web browsing assistant called Project Mariner. And this gets out one of the most important UX shifts that we're seeing, where instead of trying to adapt ourselves to what AI and agents can do, we're just trying to get agents to behave more like us. Anthropic made a bunch of news earlier this year when they showed their version of a very nascent agent that could actually point and click on your screen.
and Mariner is of a similar ilk. The model can take control of the Chrome browser, clicking buttons, filling out forms, and using the web much like a person would. Google leaders called this a fundamentally new UX paradigm shift that we're seeing right now. Quote, we need to figure out what is the right way for all of this to change the way users interact with the web and the way publishers can create experiences for users as well as for agents in the future.
The demonstration showed the agent building out an online shopping cart based on a grocery list. The process was painfully slow, with around five seconds of delay between cursor movements. The agent also got stuck and asked for assistance multiple times. For now, the agent can't use the checkout by itself, a safety limit so it doesn't need to handle credit card details. And from a functional standpoint, the agent does work like Anthropic's computer use mode, taking constant screenshots to determine its next move.
Because of this, Mariner can only use the visible tab in Chrome, so you can't use the computer for other things while the agent is in control. Google feels very comfortable with this, though. DeepMind CTO Kare Kevick-Soglu said, Because the AI is now taking actions on a user's behalf, it's important to take this step by step. You as an individual can use websites, and now your agent can do everything that you do on a website as well.
As an added bonus to preview what comes next, Google said they are testing agents that understand video games. They said the agents can, quote, reason about the game based solely on the action on the screen and offer up suggestions for what to do next in real-time conversation. If you get stuck, the agents can also access Google Search to figure out what you should do next. Google
Google is testing the agents on games like Clash of Clans and Hay Day.
Whether you're an operations leader, marketer, or even a non-technical founder, Plum gives you the power of AI without the technical hassle. Get instant access to top models like GPT-4.0, CloudSonic 3.5, Assembly AI, and many more. Don't let technology hold you back. Check out Use Plum, that's Plum with a B, for early access to the future of workflow automation. Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's security program, demonstrating top-notch security practices and establishing trust is more important than ever.
Vanta automates compliance for ISO 27001, SOC 2, GDPR, and leading AI frameworks like ISO 42001 and NIST AI risk management framework, saving you time and money while helping you build customer trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating your security posture with a customer-facing trust center all powered by Vanta AI.
Over 8,000 global companies like Langchain, Leela AI, and Factory AI use Vanta to demonstrate AI trust and prove security in real time. Learn more at vanta.com slash nlw. That's vanta.com slash nlw. Today's episode is brought to you, as always, by Superintelligent.
Have you ever wanted an AI daily brief but totally focused on how AI relates to your company? Is your company struggling with AI adoption, either because you're getting stalled figuring out what use cases will drive value or because the AI transformation that is happening is siloed at individual teams, departments, and employees and not able to change the company as a whole? Superintelligent has developed a new custom internal podcast product that inspires your teams by sharing the best AI use cases from inside and outside your company.
Think of it as an AI daily brief, but just for your company's AI use cases. If you'd like to learn more, go to besuper.ai slash partner and fill out the information request form. I am really excited about this product, so I will personally get right back to you. Again, that's besuper.ai slash partner. Still, we are not done because alongside the agents, Google is also introducing a new reasoning mode for Gemini 1.5 Pro, which they're calling deep research.
This seems to be closer to a long-form research tool than a competitor to OpenAI's O1 model. In deep research mode, Gemini responds to a prompt with a multi-step research plan. Once revised and approved, the model then spends a few minutes searching for and compiling information. It then repeats the process several times, iterating on the information learned. Once complete, the model generates a report on the key findings along with full citations of academic sourcing.
Google is calling it an agent as technically it completes this process using Google search. David Citron, product director for Gemini Apps said, we built a new agentic system that uses Google's expertise of finding relevant information on the web to direct Gemini's browsing and research. Deep research saves you hours of time. Orton professor Ethan Malek, who has gone deep on advanced academic uses of AI, seems impressed.
He wrote, "...the new deep research feature from Google feels like one of the most appropriately googly uses of AI to date, and it is quite impressive. I've had access for a bit, and it does very good initial reports on almost any topic. The paywalls around academic sources put some limits." He did also include, "...I wish they had stats on the hallucination rate. I suspect better than an undergraduate, and it is more likely to miss subtle things than to get stuff completely wrong."
He continued, one warning to instructors is that the new Google Deep Research feature solves most of the issues with AI-created research assignments. Pretty solidly well-organized and written with accurate citations, it makes it very easy for students to skip or automate their research work. Bilawal Sidhu called it essentially perplexity on steroids.
Last couple of announcements. Google is, of course, deploying these new model capabilities everywhere, and one of the first uses is an upgrade to Google's AI overviews. The company says that the tool will now be able to handle, quote, more complex topics as well as multimodal and multi-step searches. They also said it can answer questions about math and programming. You'll remember that AI overviews were part of the narrative challenge for Google at the beginning of the year. Initially, they were widely mocked online due to things like suggesting glue as a pizza topping.
Still, Google CEO Sundar Pichai said, "...our AI overviews now reach 1 billion people, enabling them to ask entirely new types of questions, quickly becoming one of our most popular search features ever. We'll continue to bring AI overviews to more countries and languages over the next year." Lastly, on the hardware side, Google has unveiled the sixth generation of their Trillium AI chip. The chip is used for training and inference, competing with NVIDIA GPUs alongside the Nutranium chip from Amazon. They claim the performance improvements could fundamentally alter the economics of AI training.
They say that it delivers a 4x improvement in training performance compared to its predecessor, as well as a significant reduction in energy use. As a more tangible metric, Google is claiming a 2.5x improvement in training performance per dollar. Gemini 2.0 was trained exclusively on a Trillium cluster. And Google disclosed that they have built a 100,000 chip cluster, which they claim is one of the most powerful AI supercomputers.
In their announcement, Google didn't provide any comparisons to rival chip makers, so it's a little hard to know how the new silicon stacks up. However, the chips are now generally available to Google Cloud users, so it probably won't take long for us to find out. Taking a step back, Google's brand story across the last couple years of AI has been a really fascinating one. I think if you had gone a few years back, Google was the default leader, both from a real and an imagined perspective when it came to generative AI.
The launch of ChatGPT and the ascendance of OpenAI really upset the apple cart. And it wasn't just that. Not only was there now a consumer product out ahead of Google, but in early 2023, the meta also carved out a totally different space because of their approach to open source. For most of 2023, Google felt distinctly behind when it came to generative AI. Indeed, even one year ago, when Gemini 1.0 was launched,
The broad perception was that their hand had been forced, that the model really wasn't as far along and wasn't competitive yet with GPT-4, and wouldn't be until they released the most performant version of it early in 2024. Basically, Google had to do something, and so they had to announce Gemini 1.0 earlier than they might otherwise have wanted to.
Then in the beginning of this year, while we did get a GPT-4 class model in Gemini, we also got what I was just mentioning, AI overviews and search that told people to put glue on pizza. And of course, the whole controversy and dust up around the historically inaccurate image generation, which forced diversity into situations in history which were very undiverse. Think black Nazis.
In other words, it was a pretty brutal beginning of the year for Google. Slowly but surely, though, that has changed. Undeniably, one of the big reasons for that is that Google got a breakout AI product hit in Notebook LM. The addition of the podcast summarization feature, which opened up this totally new set of use cases and ways of consuming information never before available, really got this ship pointed in the right direction and a ton of narrative juice back in the Google house.
That set the tone, I think, for this announcement, which was comprehensive, had a lot of great stuff in it, and was received incredibly positively. People are excited about these new features. They're excited about Astra. They're not dealing with this cynically. And importantly, from a brand perspective, it's more of a return to form than anything else. In other words, people are saying, oh, that Google that we know that we would have assumed would be a leader in this space, they are back.
And that, I think, is exactly where Google wants its brand to be. The company has an incredible number of advantages when it comes to the AI wars. They've got a slate of products to integrate AI into and to capture data from that potentially make their AI products not only very useful, but already plugged into the systems that people are using today. And so if they can continue this momentum, they could be poised for an even bigger 2025.
That's not to say that there aren't challenges, because as we've been discussing when it comes to agents, it's sort of like all bets are off and everything is up for grabs once again. Still, you got to think that the folks over at Google are a lot happier heading into 2025 than they were heading into 2024. And I think that they should be. For now, though, that is going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.