We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How Pokemon Go and augmented reality are transforming how we’ll navigate the world w/ Niantic's Brian McClendon

2025/1/21

The TED AI Show

AI Deep Dive AI Chapters Transcript

People

Bilal Volsadu

Brian McClendon

Topics

Bilal Volsadu: 即使是当今的数字地图也有局限性，它们无法像人类一样真正理解三维世界。而人工智能，特别是通过结合游戏和众包的方式，可以解决这个问题。我们可以教人工智能像我们一样看待和理解空间和地点，这将彻底改变我们导航和与物理世界互动的方式。未来，我们导航的方式将不仅仅是通过手机，而是通过一个将数字信息完美地映射到我们所看到的每一栋建筑物、街道拐角和地标的世界，这不仅会重塑我们导航的方式，还会重塑我们与物理世界互动的方式。 Niantic公司正在利用Pokemon Go和Ingress玩家的数据构建更精细的地图，以实现更精确的定位。这是一种从下往上的地图构建方式，从人们常去的地点开始，利用Pokestop等游戏中的地点作为基础，构建高精度地图，弥补传统地图的不足。 Brian McClendon: 我在谷歌多年来开发了下一代3D地图技术，包括谷歌地图沉浸式视图和ARCore地理空间API，它们将世界变成了增强现实的三维画布。Keyhole公司（后来的谷歌地球）通过结合卫星图像、地图和地形数据，创造了一种新的世界可视化方式，这与谷歌的使命完全一致。谷歌收购Keyhole后，能够投入资金获取尽可能多的卫星图像，这极大地提升了地图的质量，并最终促成了谷歌地球和谷歌地图的诞生。谷歌地球和地图改变了人们对访问和探索地点的思考方式，人们可以在出行前通过街景视图等功能预览目的地，从而更容易地旅行和探索世界。在谷歌之前，地图制作方式是人工实地考察和绘制，数据有限；谷歌通过街景视图和卫星图像等手段，构建了更完善的地图数据。地图数据需要持续更新，因为道路和本地商户等信息变化很快。 Niantic公司通过众包的方式，利用Pokemon Go和Ingress玩家的数据来构建更精细的地图，以实现更精确的定位。高斯散射和辐射场技术提高了地图的可读性和真实感，使得地图更易于理解和使用。高斯散射技术是一种新的三维数据可视化和重建方法，它可以更好地重建树木等复杂物体，提高了三维模型的真实感。Scannerverse应用允许用户快速创建3D模型，并将其添加到地图中，这有助于构建下一代三维地图。空间理解是指物体在三维空间中的相对位置关系，计算机和人工智能系统难以理解复杂的户外空间。大型地理空间模型的目标是通过学习海量图像数据，来模拟人类对空间的理解能力，从而实现更精确的定位和三维重建。大型地理空间模型可以利用少量输入数据（例如单张照片）来更新和维护地图，这比传统的依靠大量传感器数据的方式更有效率。Pokemon Playgrounds功能允许玩家在地图上放置宝可梦，这使得玩家之间可以共享虚拟体验，并提高增强现实的真实感。视觉定位系统（VPS）通过识别图像中的视觉特征来确定用户位置，而无地图的ACE0实现则利用神经网络来编码空间信息，从而提高定位精度。Ingress和Pokemon Go游戏的设计初衷是让人们一起探索世界，地图的构建是游戏改进的副产品。空间理解能力的提升将极大地改变增强现实和虚拟现实的应用，例如提供更精准的上下文建议、更逼真的场景重建等。大型地理空间模型可以与大型语言模型协同工作，提供更全面的上下文信息，并提高对环境的理解。为了解决隐私问题，未来将会出现更多在设备端运行的AI模型，这些模型将根据用户的语言、地理位置等信息进行定制。短期内，三维地图市场将存在碎片化，但最终会走向整合，出现一个占据主导地位的提供商。未来的技术发展趋势之一是将更多功能转移到设备端，以提高隐私性和数据安全。

Deep Dive

Shownotes Transcript

Translations:

中文

Some things you wouldn't mind being stuck with, like a huge inheritance. But a phone that has to be plugged in just right so it charges is not one of those things. Switch to Verizon and we'll pay off your old phone up to $800 via prepaid MasterCard for a new one on us. Just trade in any phone from our top brands with select unlimited plans. $829.99 purchase with new smartphone line on select unlimited plans. Minimum $90 per month with auto pay plus taxes and fees for 36 months required. Less $830 trade-in slash promo credit. Apply over 36 months. Trade-in terms apply. Pay off phone requires smartphone purchase and port in with new smartphone line on select plans.

Let's provide most recent bill showing payoff amount of eligible phone. Additional terms apply. If you wear glasses, you know how hard it is to find the perfect pair. But step into a Warby Parker store and you'll see it doesn't have to be. Not only will you find a great selection of frames, you'll also meet helpful advisors and friendly optometrists. Yep, yep. Many Warby Parker locations also offer eye exams. So the next time you need glasses, sunglasses, contact lenses, or a new prescription, you

All of them? All of them.

Value surge. Truck's up 3.9%. That's a great offer. I know. Sell? Sell. Track your car's value with Carvana Value Tracker today.

Hey, Belaval here. Before we start the show, I have a quick favor to ask. If you're enjoying the TED AI show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible. Remember printing MapQuest directions? Those paper maps seem ancient now that we all have GPS in our pockets.

But even today's digital maps have a major limitation. They can't truly understand the three-dimensional world the way humans do. That's where AI comes in. What if we could teach AI to see and understand spaces and places just like we do? And the solution isn't coming from a self-driving car or satellite imagery company. It's coming from millions of people playing a beloved video game on their smartphones.

Very soon, the way we navigate won't just be through our phones, but through a world where digital information is perfectly mapped onto every building, street corner, and landmark that we see, reshaping not just how we navigate, but how we interact with the physical world. I'm Bilal Volsadu, and this is the TED AI Show, where we figure out how to live and thrive in a world where AI is changing everything.

Some things you wouldn't mind being stuck with, like a huge inheritance. But a phone that has to be plugged in just right so it charges is not one of those things. Switch to Verizon and we'll pay off your old phone up to $800 via prepaid MasterCard for a new one on us. Just trade in any phone from our top brands with select unlimited plans. $829.99 purchase with new smartphone line on select unlimited plans. Minimum $90 per month with auto pay plus taxes and fees for 36 months required. Plus $830 trade-in slash promo credit applied over 36 months. Trade-in terms apply. Pay off phone requires smartphone purchase and port in with new smartphone line on select plans.

Let's provide most recent bill showing payoff amount of eligible phone. Additional terms apply. This podcast is brought to you by eHarmony, the dating app to find someone you can be yourself with. What makes eHarmony so special? You. No, really. The profiles and conversations are different on eHarmony, and that's what makes it great.

eHarmony's compatibility quiz brings out everyone's personality on their profile and highlights similarities on your discovery page. So it's even easier to start a conversation that actually goes somewhere. So what are you waiting for? Get who gets you on eHarmony. Sign up today. It's a cold day here in Alaska, but there's one animal seemingly unaffected. Bright-eyed and determined enters the husky. Observe as they go up the mountain guided by pure instinct.

I've spent years at Google developing next-generation 3D mapping technologies, including Google Maps Immersive View and the ARCore Geospatial API, which turned the world into a 3D canvas for augmented reality.

And these innovations were built on the foundations laid by today's guest, who saw the future coming decades ago. Brian McClendon co-founded Keyhole, which became Google Earth, and went on to lead the teams that created Google Maps and Street View, tools that transformed how billions of people navigate the world. Now at Niantic, the company behind Pokemon Go and other games that blend digital experiences with the physical world, he's building something even more mind-blowing than that.

And yes, it involves millions of Pokemon Go players. Brian's been consistently ahead of the curve in predicting and building the future of how we interact with our world. So today, we're going to take a deep dive into his vision for the future of maps. Get ready to explore a world where maps are no longer just about getting from point A to point B, but gateways to entirely new realities that connect us more deeply to the world around us.

So the apartment you grew up in Lawrence, Kansas, is now the default location for Google Earth. What got you into geospatial technologies and mapping in the first place? When I was in Lawrence, I started out with the Atari 400 computer, you know, long before video consoles were really a thing. And I programmed that and got excited about CG.

And of course, video games were very popular in the early 80s when I started. So I got my degree focusing on computers and 3D graphics and spent the next 10 years building 3D graphics for Intergraph workstations and then Silicon Graphics. Let's skip ahead a little bit. It's 2004 and Google is buying the company you co-founded, Keyhole Inc. Talk to me about what excited Google about the tech you were creating there and what did it turn into?

Well, so when they were looking at us, we had been out in the public for about three years and we had this Earth Viewer application that ran on PCs. We combined satellite with maps and terrain data to create a new way of visualizing the world.

And, you know, Google's mission is to organize the world's information and make it universally accessible and useful. And this product that we had built completely aligned with their mission and their vision. And so when they saw it and sat down and started using it, they got extremely excited.

And I've heard the first thing anyone does when they first got hands on EarthViewer is basically key in their home address and see the camera zoom down into that. That's exactly right. And it's really a test of whether the product works because the way you test something is you go to where you know, and if it reflects what you know correctly, then you start to explore the rest of the world because now you're excited that it matches your view of the world.

But if we didn't have high-res imagery of their country or their suburban or rural town, they were disappointed. And so our goal at Keyhole was to get as much imagery as we could afford

But with Google buying us, you know, one of the big arguments for doing so is that they were willing to spend the money to get as much satellite imagery as we could handle. And of course, that technology turned into Google Earth and a bunch of it went into Google Maps as well. Can you talk briefly about your time at Google and the engineering efforts that you led there in the geo team?

So when we joined Google in 2004, we had the Keyhole product, but we were also sitting next to another small acquisition that was working on a Maps-based product. And initially, that Maps-based product was also PC-based, but very quickly they redirected to start working on a JavaScript web-based map viewer. And they built Google Maps and worked with us. As part of that, they built this very fast app

JavaScript, Ajax engine, the first sort of like, you know, client side JavaScript situation. And then they pre-rendered every single map tile out to a server. And what that means is from a speed perspective, Google Maps was faster on day one than anything anybody had seen before because MapQuest would take 20 seconds to render a tiny little map tile. And Google Maps was able to basically pull it up at the speed of your network. And they could pan, it could zoom.

That was very exciting. But we then did something even bigger is we added the satellite imagery that we had from Keyhole onto Google Maps very soon after. And immediately, the Google Maps users had the same experience that we'd seen at Keyhole and that we would see with Google Earth, which is, can they see their house? Many people, this was their very first introduction to satellite imagery. Yeah.

Yeah, it's like the best of like an abstracted map to get you from point A to point B, but also the best photorealistic rendition of the real world. And I can't like overstate how easy it is for people to take for granted that this exists now. But back then, like you said, people were literally printing out directions from MapQuest to get around from point A to point B. Along comes Google Maps and Earth. And of course, next thing you know, this thing is on the iPhone as well.

Yeah, the mobile phones, you know, right when we were acquired, did not have the power like we talked about it, but they're just, you know, screen real estate and CPU and network were not good enough. But iPhone came out in 2007, Android in 2008, and suddenly the screen space was there. We finally had network bandwidth.

And, you know, there was a good enough graphics chip in there that we were able to get Google Maps running on an iPhone in 2007, I think, on launch 2007. And we're then able later to get Google Earth running on both iPhone and Android. That's wild. I got to ask you, what were some of the biggest changes in the world that you've seen as a result of Google Earth and Maps?

I think there's been a huge change in how people think about visiting a place and exploring a place. In the past, you'd read the guides about a place and you would talk to people and get recommendations. In many cases now, you can actually go to the place. You can go to Street View. You can look at the location. You can see where your hotel is going to be and see where the beach is and see how far the walk is to the beach.

And I think that preview of being there before you go there has made it easier for people to travel and to explore the world. And that's really, I think, you know, one of the goals is we want to make the world easier and more accessible, both virtually by on the computer screen, but also then opening it up so that people actually go out and experience it as well.

I love it. That's the perfect segue into something very close to my heart, having worked at Google Maps, building a next-gen 3D map on the foundation that you created. So I want to talk about the shift in how maps are made. For the uninitiated, the way Google or Apple makes maps, and you alluded to this with the expensive satellite imagery, is like these captures of the real world using satellite, aerial, and ground-level sensors. They're

There are these like super structured and semi-frequent captures of the world. But now at Niantic, you're trying to build a different kind of map in a different way. Tell us more about that.

Well, to bring it all the way back into history, before Google, the way that people built maps was that they would literally drive around in a van and take notes and draw pictures, and they would only visit the most popular or important urban locations. And so the map data that Google licensed in 2005 was the best you could get, but it wasn't great. It was based on government data plus as much work as the companies had put into it, like Navtech and Taliatlas.

But at Google, we realized that the maps were not good enough, you know, and we realized this honestly when we put Street View in because the first thing you do is, you know, people looked at the Street View pictures, they see a photo of an intersection, and then they look at our map data and it's wrong. And they say, how can this be? The pictures that you gave us is clearly correct.

That project that we started was called Ground Truth for exactly that reason. We built our own maps. We started with government data, but we had the power and superpower of Street View and satellite imagery and a lot of elbow grease to start building maps. And we launched

U.S., Mexico, Canada in 2009 and continued on for five more years until we had basically mapped all of the larger countries in the world or the larger GDP countries in the world and used user-generated content to map the rest of them. And that's to a degree why Google Maps are better than many of the other providers because we were able to use this data to make a better map.

Now, the problem with any map is that things change. And so even if you were perfect on an edit snapshot in time, it immediately starts to fall out of date. You know, my rule of thumb has always sort of been that roads change quickly.

at 1% to 2% a year, and that local businesses change at 10% to 20% a year because there's a lot of turnover in local businesses. And these are the most important things people search for is, where am I going? What restaurant do I go to? Where do I get my dry cleaning or whatever it is? This data changes. And so keeping your map up to date is a really big challenge. And you need

people on the ground, you need new data, and you also need signal that, you know, things are good or not. And Google, I think, has done a good job for the basic data. But, you know, at the level that Niantic is approaching it, it's much different. We are very much on the ground collecting imagery at a level of detail finer than even Google has collected.

And the challenge there is the closer you look at the world, the more change there is. So it's even harder to maintain an accurate map if you're trying to get detail down at the bench and park and chair scale of the world. Why do we need that kind of map? And why approach it in building this kind of crowdsourced manner that you are? One of the things that Niantic realized is that

To build an accurate localization system, to know where somebody is, you need data far beyond a street map. If you want to know exactly where you're standing relative to a statue or a park or even a sidewalk, you need a level of detail that just isn't available in Google Maps today. And so building this much higher precision localization system, which we call a visual positioning system,

required this high resolution data. Pokemon Go was launched in 2016. It was the first AR game. Augmented reality was enabled with Pokemon Go. You could take pictures of Pokemon in locations. And we got Pokemon Go players and Ingress players to actually actively choose to scan our Pokestops to start building this map for us.

And that data has been put together. We've created a VPS system with it. And now when you point your phone somewhere, we know exactly where you're standing. - In case you need a refresher, Pokemon Go is an augmented reality game for your smartphone that took the world by storm in 2016. Overnight, it felt like everyone was wandering around, staring through their smartphone cameras, hunting for virtual Pokemon to catch. Parks, streets, and even parking lots became hotspots for adventure.

Before Pokemon Go, Niantic created Ingress, a game with a more sci-fi edge. Instead of catching creatures, players split into two factions and battled for control of real-world locations, linking them together to claim territory. It's also interesting that you're talking about, as a complementary map to Google, the

What Google, Apple, and other companies are doing, it tends to be sort of like the drivable areas of the world and maybe some of the trekkable, walkable areas. But there's so many parts of the map that, especially as you mentioned, like parks and other places where people do congregate that are never mapped in that level of detail. And you're able to do exactly that. So it's almost like the inverse of sort of what the mainstream mapping providers are doing. So you can enable this kind of world-anchored AR experience.

And that itself is cool tech because you're totally right. You talked about visual positioning system. You know, GPS just is not good enough. Like if you've got five meters and then like 30 degrees of rotational accuracy, like the thing that you placed in the virtual world will rarely seldom line up with the thing that needs to actually be there. But you can do far higher precision with the VPS maps that y'all are building. Is that right?

Exactly. And the way we think about it is that the prior methods that you and I both worked on at Google, you know, build the map from the top down, right? We started with satellite imagery and that inspired us to then use Street View and so forth. Niantic is building the map from the bottom up, you know, from the locations that people spend time. And we have this advantage that, you know, we have a pretty curated list now, you know, eight years into Pokemon Go and actually 10 or 11 years in, if you encounter ingress of, you

20 million pokey stops, 20 million way spots, we call them, that are sort of congregation points for people, landmarks for walking and, you know, are in those areas that you talk about. They're in parks and they move around, but they're not part of the official business or street sign locale. And so those points, you know, play a central aspect in the game, but also give us the basis for creating this map where these are the little islands that we will then build out from.

I love that. Yeah, that has to be some very interesting data of just like, what are the points of interest that are compelling to users on a neighborhood level? Like even in my neighborhood, like what are the landmarks that here in like Sunset Valley, Texas that people care about would be very different than perhaps, you know, the rendition or interpretation from a traditional mapping provider. But there's something interesting you said, which is,

You're talking about visual positioning system, which I kind of think of this like machine readable map, this map that a machine can look at. It'll compare your photo with the prior map that exists and figure out, ah, you're exactly located here on the globe.

but we're also seeing a boost in human-readable maps. So not just how maps are captured, but what we can do with them. Can you explain why the shift has been significant to our listeners, you know, who might not be computer graphics nerds like you and I? I'm specifically talking about Gaussian splatting and radiance fields here.

Here's how it works. First, you take a bunch of regular photos of a place from different angles. The system then creates what's essentially a cloud of special 3D points called gaussians. Think of them as these sophisticated light-carrying bubbles. Each bubble knows not just about its color value, but also how that color changes when you look at it from different directions. Much like how a car's paint might shift in the sunlight as you walk around it.

What makes this special is that it runs super fast. You can zoom around at like 100 frames per second, just like a video game, while still looking incredibly realistic. This is especially exciting because it means we're getting closer to easily capturing and sharing perfect 3D replicas of real places that anyone can explore on their phone, computer, or even a VR headset.

For a long time, building a visual map of the world with satellite imagery was a top-down sort of 2D array of pixels kind of situation. But Google and others started collecting aerial imagery with oblique data and started making 3D buildings out of it. 3D reconstruction allows for a pretty good 3D model of the world. But if you want to see the pain, all you have to do is just go look at trees.

Trees are this super hard visualization and reconstruction problem. And anytime you zoom in on any of this data that you see from Google or Apple, trees are the worst part of it. Broccoli trees. Broccoli trees. There's many good reasons for it. For one, they move between every picture, so they're not the same thing twice. They grow, the leaves drop, and they have a huge amount of detail because they're effectively fractals themselves.

And so reproducing trees is really hard. And what we discovered, you know, with this paper that came out at SIGGRAPH last year about Gaussian splats, it's a new way of both visualizing and reconstructing 3D data. And what it does is it retains

not just the specific point locations of things, but also the lighting conditions from every angle. So it achieves a visualization realism that is far beyond what polygonal reconstruction was able to do. And in particular, with the transparency possibilities of Gaussian splats,

Trees come out really, really well. If you look at them, they look realistic. You can see through them and they're stable. And the added realism sort of gets us over the uncanny valley that I think that many prior 3D reconstructions have had where they aren't really believable. It doesn't look right.

It's kind of like going from like, I don't know, GTA 2 or 3 graphics to suddenly like GTA 7 is like suddenly the leap that we've had. Exactly. And, you know, urban canyons are an interesting problem because a lot of urban canyons have planted trees down at the street level. And those trees actually block the storefronts and make it hard to...

you know, give you a real visual cue about what the place looks like on the ground if you can't reproduce them well. And so even urban canyons benefit significantly from this new reconstruction.

Totally. And I mean, I remember playing with the previous instantiation of radiance fields, neural radiance fields a couple of years ago. And I was like, I needed this like beefy GPU or I needed to go beg somebody at Google for some TPUs to go process these data sets. And it took hours. And then we had a chance to meet at Niantic HQ earlier this year. And I was blown away with what y'all are doing with your app Scannerverse, which is basically 3D Gaussian splatting in real time on the phone in your pockets. Yeah.

And now you can bring those things on the map. So tell me a little bit about Scannerverse and your vision there. Yeah, so we acquired Scannerverse in its original form in 2021. And, you know, it is the preeminent 3D reconstruction for, you know, sort of old school photogrammetry and produces very nice models that people use in many different applications.

But early in 2024, we added Gaussian splats to the output of Scanaverse. And to use it, you can use an iPhone or an Android. You don't need LiDAR. And you just move the phone around an object and get it from angles high and low. And it's able to reconstruct the object or the scene or the room very, very quickly. And in particular, on an iPhone, it can build your Gaussian splat in about a minute.

And there are several reasons why this is good. One, you know, you get quick feedback. So like a Polaroid, if you don't like it, you can shoot another picture, you know, within a minute while you're still at the location. So that's a big advantage. Another is privacy. You know, this data doesn't leave your phone unless you choose to send it out. So you can build your model. You can decide if you like it. You can decide who you share it with. And literally, it stays on your device until you upload it.

We've added the capability very recently for you to add it to our map. And so there's now the Scannerverse map allows you to walk through and see all of the other scans that other people have uploaded, including ones that we've built from the Pokemon Go and Ingress scans as well. And so this map...

is our beginning of the next generation of 3D reconstructed maps. Ah, the seed map for this next-gen map. Are you excited about the fact that there's sort of like a standalone way for people to capture these type of things now that we have outputs like 3D Gaussian splatting versus, say, having it be a part of the Pokemon Go or Ingress experience? How do you think about that? Well, I think when we were just building the VPS, I think we struggled with something you mentioned earlier called the invisible map.

Right. It is an invisible map and it's very important and it provides great value, but there's no way for people to sort of understand, will it work here? Does work, you know, how does it work? And the advantage of Gaussian splats are that the same data that we're using to build VPS can also be used to use Gaussian splats. And now we can visualize it.

We have data here. You can see it. And yes, you can also localize yourself at this particular location. So you can have either an AR experience or VR experience from the same location, depending on which way you want to go. And we've built products that allow you to develop both of those experiences on web or on any device you want.

That's really exciting because yeah, you're totally right. If you're, you know, you know, an altruistic user of Pokemon go and you absolutely love the game and you want to unlock certain experiences in the part of your city, it's one thing to be sitting there and kind of scanning to create this invisible map as you call it. It's another thing entirely to walk away with this artifact. That's kind of useful in itself. It's like,

It's like literally a 3D copy of that place, right? It's like I've been describing it sort of as like memory capture. You capture a space or place once and then you can reframe it infinitely in post. And like you said, even unlock VR experiences. So it's kind of cool that we've got both half of the coins now with this technology at our hands.

Yeah, and we've had many different, you know, paintings, then photos, then stereo, then video. This is a new form of quick 3D capture that I think retains a balance

better and more complete feel of a place than any single photo can alone. Because, you know, if you drop into it in a headset or you look at it, you know, and navigate around on your screen, whether your desktop or your mobile, you get a much better sense of what the location's like. And so that anybody can collect these and share these and publish these, I think is a superpower. And that brings us to a new announcement that y'all made recently, large geospatial models or LGMs.

Before we get into that, can you just explain to our listeners, like, what is spatial understanding and why do computers or even AI systems struggle with this stuff today? Spatial understanding is how objects in three dimensions sit relative to each other in the most simplistic form. If you're in a room, how does your chair sit next to your desk? Where, you know, who's in front, who's in back? These are generic problems that, you know, many people in many offices around the world have.

You can imagine training a model that understands almost every office configuration and has a pretty good idea of what that means.

Once you go outdoors, you discover that the world is much more complicated and much more variable across geographies. And the way I like to think of it is use an example, which is for those of your listeners who have tried the geoguessr game, this is a case where you look at a picture in Street View and you guess where on the map this picture came from. And what's fascinating is just how different these pictures can be and how much information in a single picture

is contained about where you really are. But the differences are either very obvious or very subtle. And there's a player called Gio Rainbolt, who is just amazing at this. And I like to think of him as building a neural net in his brain from studying hundreds of thousands or millions of these pictures that he now knows either consciously or subconsciously the signals that each of these pictures produce.

And what I think we're talking about with a large geospatial model is to reproduce that neural net by feeding it not a million photos, but hundreds of billions of photos.

And if we can do that, then maybe this geospatial model will have enough understanding to localize you, visually position you, 3D reconstruct the parts of the scene that you can't see because it's seen enough of the front of a church to predict what the back of a church looks like. And so it...

The opportunity is big, but the data set required and the understanding that a model would have to have is very, very large. And so that is, in fact, what we're working on.

I love that. And it sounds like you alluded to having 10 million scans, you know, sort of these seed locations worldwide. And basically the shift is we have these islands of spatial understanding, these individual maps where you can figure out exactly where the user is. But now you're working on kind of fusing that together. I think you gave a really great example of how that fusion works. Can we go a little bit more into that?

Like, why is this the better way to build this type of map than perhaps what other folks are doing? Well, a systemic coverage of the world that one of the challenges is that to keep it up to date, you have to revisit all of it all the time or have a really smart model about visiting that which changes all the time. And.

If to visit it, you have to send a sensor, a street view car, or, you know, a Waymo vehicle to go collect the data, fly a plane overhead. These are all heavyweight things that are not relatively easily repeatable or don't provide high coverage.

And if we can get to the point where a single photo can give enough information about whether the world has changed, and if so, how, then you now have the opportunity to update and maintain a map of the world from sort of very small inputs, single pictures every now and then.

And the rest of the system can detect that, yes, this is different than what was before. And we can then deduce the other changes around that photo. And so I think this is an opportunity to build a much better, more frequently updated and more accurate map. It's like you're building a map that is not only resilient to change because obviously, as you said, the world changes at different clips.

but it doesn't actually require you to map every inch of it with these really, really expensive sensor systems. I'm kind of curious, like, how do you see the product experience for games like Pokemon Go and other stuff that Niantic is working on evolving as you create this new type of map?

Well, one of the things that we'd launched just, you know, in the last week is what we call Pokemon Playgrounds, which is this ability to put Pokemon on the map at a Pokestop way spot location and leave them there precisely so that the next user can see them or add their own Pokemon into a collection. And so I actually build up a little collection of Pokemon, allowing for a sort of shared virtual experience and

One of the challenges with augmenting reality in general is the believability factor. And if everybody sees a different world, you can't talk about it like it's actually a world. It's just your vision and you're hallucinating. But if we see the same thing at the same place at the same time, then it's a shared experience. And you really are augmenting the world. You're not augmenting yourself.

It's like this next level from like dropping a pin and sharing it with somebody. You're kind of like annotating this 3D map of the world. And then anyone who comes there in that location, whether they're there then, I guess in this case, you know, even if they come after the fact, they can see that exact same annotation the way you left it. So there's a lot of hard engineering that goes into that. You keep talking about this term localization. Let's unpack that and kind of talk about the old way of localizing and the new way of localizing that y'all are investing in.

Sounds good. So we call it visual positioning system, and that's actually a very clear name. It's visual. What you see is what you get. And how you do that is in the old school is you collect a bunch of data. You try to build a point cloud of features of things that are easily visually distinguished.

And their position in the world is fixed. And so when you see them, kind of like a star field, they're all in a particular position. That helps you find where you are because you see these single features in a particular orientation. So that's the way it's always worked in the past. And what that is, is a point cloud map of the world.

With our map-free ACE0 implementation, though, we did something different. We took for each scene, we train a neural net model with the video scans that our Pokemon Go and Ingress players have provided, and we build up a neural net that has the same capability, but it encodes the space into this network. And now when we send a picture into this network, much like you would upload a photo into a large language model right now,

it goes through and it can tell you exactly where you're standing and it can do so more accurately than our prior algorithms that just use this visual point cloud. And so we call it ACE0 and it's taught us a lot about how to take those video scans and turn them into a reasonably sized neural net that encodes all of this information about a location.

So that's kind of wild. So instead of having an image come in, extracting some features, trying to match that against this 3D model that you've created offline, you kind of just provide this image to a neural network and you get back like, yo, this is where you're located.

And as like going back to the GeoGuessr example, maybe the neural net will be far more resilient at sort of figuring out how best to localize the user or figure out where they are in 3D space without just relying on these static features that don't change over time. Exactly. It seems to be more stable.

You know, as we said, the world changes all the time. I'll say that, you know, going back to our tree problem with localization, leaves falling from trees and sitting on the cement become, you know, visual features that actually make our point cloud solution not work as well. But the neural net is more robust to this. That's an example of how to overcome change and find the core static solution.

true, solid ground to localize against. And I think that's helped with our accuracy over time.

That's a great point. Yeah, trees, if you extract features from trees, they definitely change with the seasons and they may not be the most resilient anchors, if you will, to to localize against. But you have this idea of sort of an AI figuring out how to do what the best geoguessr players in the world do, because I've seen some of those videos too. And you're totally right. It's almost like this person's a human VPS, like it's this kind of wild thing.

And I can only imagine what will happen when you start taking these large data sets of scans and start putting them together to see if you could create something that is better than the best GeoGuessr players. I mean, that is certainly a bar we would like to achieve at some point. And it's very good that we have somebody and actually a whole game and a whole set of competitors who are all in the space. And they all use different techniques. Each of their neural nets is tuned a bit differently and some are good at some things and others are good at others. But

I think it's educational to watch how they play and how they think about it because, you know, if you've watched Rainbolt play, he'll talk through how he, you know, some of the signals that he sees and why he makes some of the decisions that he does. But some, he can't explain it. You know, his brain just goes there. And that's because, you know, his neural net is baked in pretty good too.

You know, related to that, one question I know a lot of people have in their minds is like when y'all were building Ingress and Pokemon Go, like how much of the gaming and product experience is designed in mind that it's like equally fun to play, but also conducive to kind of building this sort of like map of the world?

I would say it was the gameplay and game design was focused almost entirely on getting people to explore the world together. That's Niantic's mission statement. And so I think the focus on location was really about how to get people outside, how to exercise and how to play games together. And I would say that the games themselves were not designed to build this map. The map became a...

follow-on side effect of making the games better. And once we started to want to know exactly where somebody was to decide whether they could spin a Pokestop or not, we realized that, you know, figuring out where you are in a GPS-denied or GPS urban canyon is really hard. And so if there was another better way to solve it, you know, could we create that? And that's probably the genesis of the VPS at Niantic.

Yeah, it's like it's almost like a means to an end versus an end in itself. And yeah, you just alluded to GPS denied. That's another great example where like you need visual positioning because, yeah, your GPS signal bounces around when you're like, you know, surrounded by tall metallic buildings that are, you know, reflecting that GPS signal as it gets to you. And I think that's

Everyone can relate to sort of walking down the street in one direction and realizing they're actually going in the opposite direction that they intended. Obviously, that problem doesn't exist when you're using visual positioning. Exactly.

So in the interest of it sort of being a means to an end and not the end in itself, let's talk about why is this such a game changer? Like, what are the sorts of possibilities this opens up for, let's say, augmented and virtual reality? It feels like we're already seeing the instantiation of the next computing platform and these devices are getting more and more real. What can we do once machines have mastered spatial understanding?

I mean, I think if you look at the focus of large language models right now, a lot of them are around providing assistance. And you explain your problem to them and they give you advice. And in some companies' views of the world, the goal would be is that you can ask them a question and they'll use the context of the question, but they'll use all the other context that they know about you.

and any other context that they can get their hands on. And I think one of the important bits of context that even a camera by itself doesn't have is, where am I exactly and what is around me? Now, a camera can see what I can see or where it's pointed,

But it doesn't know the rest of the story. It doesn't know what's behind me. It doesn't know what's behind that wall. And there's a lot more context that can be provided to an assistant who can include that information in their advice. And so I think contextual advice is one of the big applications there.

Building a view into the places you can't see for short-term navigation, for answering questions about facilities or safety. These things are derivable or knowable from

a larger model that can recognize sort of systemic examples of the problem. Because all of humanity, you know, streets, street corners and sidewalks are similar in many places in the world. And there's a pretty good guess that if this has a sidewalk here, this sidewalk will continue.

What can you do with that information? How can you visualize it? How can you tell users about it? And I think these models will be able to answer these questions without having continuous input, without being forced to have video on all the time. What you're saying is really interesting, right? Because you're totally right. When these large language models operate, they kind of have they're pulling on, as you said, world knowledge that they've seen on at least public content on the Internet.

All right, so this is pretty cool. Basically, this technology gives you the ability to search what you see, but also what you can't see. You can start asking all sorts of amazing questions like, hey, which of these hotel rooms is going to have an ocean view or a city view? Hey, how much sunlight is this room going to get? Or pull up the reviews for this restaurant that you're looking at. And of course, given Niantic's own focus, you can literally reskin the world for gaming applications. The sky is truly the limit.

But what's also cool is how these models can work in concert with large language models. You're trying to do the same thing for the real world, right? And I'm kind of curious, how do you think about these large geospatial models working in concert with these large language models? You mentioned, for example, not having the camera on the whole time. I mean, this gets into sort of like all the privacy question people have about glasses, right? It's like, do I really want a camera on my face? Is it like LiDAR? So it's not, you can't see exactly what it is, but you can see the structure of

But it seems like there could be this sort of holy matrimony between what you're building and what other companies are building, especially given that they can understand visual inputs and even audio inputs now.

Yeah, I think one of the things that we're going to see, you know, large language models, these large geospatial models we talk about, image generation, all of these at the moment tend to be cloud-based, right? They tend to be big models living in the cloud. They work really well. OpenAI is a fine company to provide you with, you know, a service, but it means you're sending your data to them. And, you know, I do think that there is a privacy issue that is going to get resolved by having these models get small enough that they run on your device and really

Most of what goes into them stays on the device. A highly trained model tuned to, let's say, your language, your geography, your place could be much smaller because we know that you're in

Kansas City, or we know that you speak English, or we know that the visuals in question are going to be sports-based because you walked into a football stadium and you queried the, you know, football digest version of the model. So with, you know, I think there's going to be a sort of a tricky balance between what is on device and what is in the cloud, but piecing together these small models and

so that you can bring them onto your phone and answer questions without having those answers go back up to the cloud. That intuitively makes sense, right? Like if you think about, I don't know, like a taxi driver in New York kind of has a map of New York in their head. They're not constantly needing to reference Google Maps. Or one of the other analogies I use for visual positioning is like Shazam. Like Android's version of Shazam can figure out what song is playing automatically.

without needing to send that audio up to the cloud, your device just knows the signature of all these various songs and you just do that locally. And then, like you said, for certain experiences, when you need to send that up, you can or other insights that are derived. It really does feel like we're seeing right now kind of a lot of the magic happening in the cloud, I guess, because it's like easy to manage, build and serve. But yeah, like.

as these things get out into the wild, like why do I need to send, you know, a photo up or a video feed up just to figure out where I am? I'm excited about that. How far out do you think that is? I mean, for specific problems, it already exists. We've seen that like in the large language model world, we've seen 70B models train 3B models to train a 1B model that does exactly this one task really well. And a 1B model can sit on your phone and be performant and not even burn a lot of power.

And I think the same will be true of these other models over time. And, you know, they can be sliced and diced in many different ways. Like they can be task customized. They can be geographically customized. Like I said, they could be language customized. So if you once you know the subset of the problem that you're really trying to solve, the model can be downloaded and

And after that, everything is on device. And that's, you know, from a privacy perspective, it's something I'm very excited about. That's magical, right? Yeah, you've got your like I've been playing around with the new Apple intelligence. It's got a bunch of these like rewriting models that are running all on device. Maybe you have some future instantiation of like a distilled large geospatial model that like knows its way around the city. And so I just point my camera and get an answer about like what I'm looking at.

And then to your earlier point about like kind of x-ray vision, like what's even like behind the buildings all without the data leaving the device. That's freaking cool. At Verizon, anyone can trade in their old phone for a new one on us with Unlimited Ultimate, which means everyone in your family could get a new phone and stay on your family plan, keeping you close. Hey, mom, you seen my toothbrush? Oh, maybe too close.

Trade-in and additional terms apply. See Verizon.com for details.

It's 2025 and a new year means new opportunities. Been thinking about starting your own business? Shopify's got you. Shopify makes it simple to create your brand, launch your store, and get your first sale. With customizable templates and powerful tools to sell on social media, you can start selling everywhere people scroll. They'll handle the shipping, taxes, and payments so you can focus on growing your business. Don't wait. Start today and make 2025 the year your idea takes off.

With the Redfin app, you'll know the moment your next place hits the market.

Now, before we go talking about the future, I did want to get your take on sort of other approaches to crowdsource mapping, right? Like,

Like the two companies that come to mind for me are Hive Mapper and even Meta's Mapillary, which was an acquisition. And they're kind of focusing more on like sort of dash cams that you pop on, you know, in your vehicle as you're driving around. You've got a bunch of ride sharing vehicles like fleet telematics companies like FedEx delivery drivers that got all these cameras. What do you think about some of these other approaches to crowdsource mapping?

I think that the, I mean, obviously they're collecting a lot of great photos of the world. The struggle with Mapillary and with HiveMapper is really the pose on these photos is not good enough to do Gaussian splats, for example. They're just, they're not. And I think, you know, we're very more interested in a reasonable frame rate video, you know, with the camera changing orientation and being able to track the IMU of the camera at the time.

Yeah, I think that makes sense, right? Like Mapillary, maybe that one photo every 10 meters or whatever is enough to figure out, oh, the speed sign changed. It's now 35 miles an hour instead of whatever it was before. But it isn't enough to create this like 3D rendition of the world. You just don't have enough views at it.

What are the incentives for creating this map? Right. Like, do you have any guesses on like how many people it would take to sort of map the world in this sort of like decentralized crowdsource fashion and what's in it for them?

In the early days, it's the inverse of the Google Earth problem. So with Google Earth, you would zoom in and you would discover whether your house was in high res or not. And you'd either be happy or sad. Our answer is you can put your location, your neighborhood onto the map. You can solve this problem yourself. And what we found is people are really proud of their neighborhood, of their city, of their landmarks.

And so being able to have a high quality representation of their neighborhood is, I think, a strong motivator, not for everybody, but for enough people that I think we can put a pretty good dent in this problem. Yeah, it makes sense. Like where the users are, they can kind of create this map. And I think that brings us nicely to the fact that like

You're using this new kind of 3D map of the world, not just for, you know, your own first party experiences like Pokemon Go, but it's a platform that other developers can then build on, right? And so if they want to unlock this type of augmented reality experience, wherever they may be, they've got means to sort of put things on the map and then start building those experiences without needing to go through, you know, like the mainstream mapping companies to have them go map those places for them.

That's exactly right. I mean, one of the things we provide with our data is APIs. We have a Unity development kit called ARDK that allows you to bring this kind of data for both VPS into Unity. But we also have this new Niantic Studio, which is a low-code, no-code way to author experiences initially for the web, but in augmented reality and virtual reality, where you get to pick

all of the locations from the million locations we've already mapped. But if we don't have that location, you can take Scannerverse out and go map your location or your 10 locations for your game or your experience and build up a great experience around it. And I'm really proud of the Niantic Studio experience and how easy it is to use.

That's really exciting, right? Because you're right, like there's suddenly a barrier with the moment you bring up Unity or whatever game engine that you're using, you suddenly need development experience. But to kind of like literally be able to capture the world and then turn it into a canvas for your creativity and do it in this like no code fashion, that's really, really exciting. But there's also like non-entertainment use cases here, right? And of course, y'all are focused on the Niantic Spatial Platform.

And I saw a bunch of use cases there, like spatial planning, warehouse logistics, audience engagement, remote collaboration. What are you most excited about? I think the thing I really am impressed by is this idea of a real-time shared AR, VR experience. So let's say you send an operator out to a site that's got a device that's got a problem.

that operator can scan that device and build a 3D visual map of it and upload it to the cloud and immediately show it to somebody who's back at their desk or back on a VR headset. And that user can then see

the VR user and the VR user can see the AR user and they can talk about exactly the same thing. One is there virtually through VR and the other is on site. You know, one of the things that I've seen just in general is that they're, the level of knowledge that, you know, is needed to solve some problems is very high, but the ability to get out into the field is also a lot of work. And if you have to do both, your coverage is going to be much less. And so this idea of having, you know,

multiple data collectors and fixers with the manuals open and everything I think is going to change how a lot of repairs and a lot of products are built. I love that. It's like, I find myself always going back to this, like Peter Thiel quote of the world of bits is easier than the world of atoms. And you kind of have this technology that in a sense is connecting bits and atoms and like

A field service expert going to, I don't know, frigging fix like a power issue or like a 5G tower or something and needs to get that one expert that's sitting in some part of the world to weigh in on it, like as if they're actually there. Sounds really exciting. And it's cool because you can immediately see why that same capability to freaking leave that like Pokemon, you know, on that sidewalk for somebody else to discover is exactly the same technology that enables this far more utilitarian and useful use case.

That's right. And we're very excited. We've got several partner customers building experiences to solve the field service issues.

style problem right now. But you're right. It's this ability to take consumer capabilities and build out enterprise products, something that we learned way back in the day at Keyhole, right? When we started Keyhole in 2001, that was the end of the dot-com era. And we thought we'd get millions of users and monetize later. Well, the year 2002 came around and we had to go to enterprise. And so what could we do with this satellite imagery product

Well, we built enterprise services and enterprise targeted certain verticals that really wanted this capability and were willing to pay for it. And so that was how Keyhole survived sort of the dark days of 2002 and 2003. But by 2004, we were doing pretty well when Google bought us.

And now it does feel like there is a bit of this chasm now with like these hardware devices, right? Like I was at the Snap Summit playing with the Snap Glasses. Of course, you all have a partnership there. You'll pretty much have a partnership with all the AR glasses creators. And it's like, yeah, these are still kind of dev kits. They're still not quite there yet. The main experience is on the phone, but we can see it's just a couple of years out. But I can imagine an enterprise, you know, shelling out for like $1,500 pair of glasses is like the ROI is like immediately clear.

Exactly. And I think, you know, you're going to see there will be interesting applications for mixed reality devices like AVP and Quest, but

because they can do the equivalent of AR, except it's MR, but many of the same applications work. And, you know, obviously the headset on the AVP is pretty big, but it's beautiful. And for a subset of problems, it might be useful today. The Quest 3 is a lot cheaper and a bit easier to wear and allows for many of those same MR experiences with the same co-localization that you were just talking about.

I love that. Yeah, it's like I've got the AVP, the Apple Vision Pro myself, and it's like it is sort of like this glimpse into what amazing AR glasses will enable. I guess in a sense, it's like, you know,

You know, without waiting for like, I don't know, the likes of Meta to take that $10,000 pair of Orion glasses that they showed to make that a mass market product. You can kind of just put cameras at the front of a VR headset and pass through reality in a sense. And then but still build out these experiences that will transfer over beautifully. But at the same time, we also have like the Meta like Ray-Ban glasses and these lighter weight form factors that.

Are you excited about bringing geospatial intelligence, especially when you combine them with, we talked about large language models, to these lighter weight form factors that are more like a microphone and a camera on your head and maybe a really small display, but sometimes not even that? There's a set of capabilities you can add around localization and where are you? These Ray-Bans are going to be an interface that Meta is obviously hooking them up to their AI and Gen-AI interfaces.

They need context for input, and the camera can provide some of that context. And the ability to turn a camera photo into additional context will help with the assistant model the MetaRay bands are working on.

But I think that once you start adding a display, it gets better. But I do agree, the Snap Spectacles are actually really impressive in the sense that Evan has a vision that he will build these things. And I think more than almost anyone else, he's really focused on the consumer use case for AR. And so I think we were surprised at how good this version of Spectacles is. It's still not a consumer device. It is a DevKit++ device.

but it is, it points at a good future. The Orions seem awesome. So we're very happy that Meta's in the game too. And obviously they're investing a lot in this.

But I think the interim step of MR is more interesting on the enterprise because consumers are never going to wear a Quest headset outside. But the enterprise user may well feel that wearing one of these makes them a better operator, better technologist, whatever it is, and they're willing to do it to do their job better. And so I think we will see MR use cases in the enterprise before AR takes over on the consumer side.

That makes total sense. Yeah. Even despite Apple's best efforts to sort of make the Apple Vision Pro cool and T-Pain walking around wearing those things like, yeah, you know, like I stopped seeing those at malls very quickly. It became like a bit of a trope. But yeah, you're right. Like,

It is the closest thing we have to that North Star experience. And it'll be very, very exciting. Fast forwarding a little bit there. How do you see all these advancements affecting, you know, even how like cities are designed and how public spaces are used in the future?

I think that one of the things I've always wondered about is signage. Signage is both good and bad. It's good in the sense that it makes it easier to understand where you are. If it's not in your language, that's a problem. You don't know exactly what the sign means. Signage is one where in a world where everybody had AR glasses,

you wouldn't need to label anything because the labels would all be augmentations that everybody gets to see in their language at a density that is relevant to them. And that's pretty exciting, but that's definitely only going to happen when everybody has the glasses. And because at the end of the day, if you're deviceless, you still need to figure out where you're going. And I've looked at cities, you know, I regularly look at cities when I go to visit and it's

very interesting how dense signage is in some cities versus others. And Tokyo being one that is, you know, unbelievably dense. And the problem there is that at least half of it, I don't understand. Thankfully, half of it's in English, which is very useful. But the important stuff many times is only in Japanese. And I will learn kanji in katakana, but not very quickly. Yeah.

And I mean, Japan also seems to be sort of like the final boss of 3D mapping. There's just like so much stacking and 3D, even 4D-ness and nesting that's going on that, yeah, building a model to encapsulate. Yeah, some of the denser Japanese cities just feels like it's going to be the last thing, then maybe the highest standard for 3D mapping.

Well, it definitely was for Google because I think it was one of the last countries that Google actually launched their own map data for because the existing in-country supplier, which was Zenrin, was really good at what they did, but they did it by employing hundreds of thousands of people to go collect that data because Japan took their maps very seriously. And so it was very hard to keep up with that, but eventually Google got its data good enough that it was beyond that capability.

One other thing I want to hit on is, you know, you kind of talk about connecting the sort of central mission, the mission statement of Niantic, and also about, you know, there's like utilitarian version of like, you know, an AR and VR experience where like you use VR to preview a place that you want to go, like immersive view and maps or, you know, the experiences that you mentioned where you can remotely see a field service expert what they're doing.

But it seems you and Niantic as a company is far more bullish on AR. And that is something y'all share with, you know, Snap. Evan has been very, shall we say, blunt about his take on VR. I'm kind of curious, why are you so much more bullish on AR? Like, do you think there's a world in which VR will be just as compelling? Like, because, you know, people have consoles, people have desktops that we use, and those aren't necessarily experiences that are always anchored in the real world.

I think VR has a fine experience. My PC, as you say, game consoles are all great examples of how VR will continue to be consumed because even without the headset, virtual reality through my desktop window here is pretty good, especially with Microsoft Flight Simulator 2024 coming out. The reason AR is more interesting is that

You know, we've already proven that we have, you know, three to five billion of these smartphone devices and everything we do, the number of minutes we spend on our phone, not in a VR experience, but in a data experience, in a video experience, in a text experience that is huge. And AR glasses are a better screen for that experience to happen on.

And, you know, they're more convenient. They're, you know, they free up your hands, at least to some degree. And they still allow you to look up at the rest of the world. And I believe that these AR glasses are going to replace phone screens. And so that's why we're bullish.

That makes a ton of sense. Yeah. I mean, like my phone is definitely my primary computing device and I think is for everyone else. And like, yeah, there's still context where you want the, you know, currently we were probably staring at multi-monitor setups right now and we've got our setup to lock down and get some work done. But yeah, when you're out and about, it's the phone. And it's so weird that we have this slab of glass that we have to keep looking at. I just can't wait for it to

You know, you go to any of these concerts, especially, you know, I'm in Austin, Texas. There's like before and after of ACL Austin City Limits. Like, you know, the skyline certainly changed. But the other thing that changed in the content is like not everyone had their frickin phones up, like completely lost in the experience. And that is something, yeah, we could use technology that connects us more to the world around us.

Exactly. I think phones get in the way of that. They cause you to look down and not at the world. They cause you to hold the phone up and take a picture, both of which glasses could replace. And those are both not healthy experiences for us.

So as we wrap up here, like when you think about a 3D map that is, you know, ubiquitous and we need for all of these devices, you know, perhaps there are like glasses or phones, like, I don't know, like food delivery robots on sidewalks delivering stuff. It feels like we're moving to a more connected world. Yet it feels like we're going to have these like overlapping maps of reality, kind of like we do today. There's like a handful of maps of the world moving.

Is that the future ahead for this new kind of 3D map or do you foresee some sort of consolidation? Because when I think about GPS, it's like, yeah, there's a couple of different GPS constellations that you use in different parts of the world, but it's largely this like public good that anyone can use, right? Maybe the public sector subsidized it. How do you see it playing out for this next generation 3D map?

I think in the short term, there'll be fragmentation because there'll be subsets of the problem solved by different maps for different reasons. Waymo has an excellent map of Phoenix, San Francisco, and Austin, and they use that very specifically so they can drive safely around the city without a driver. So their map is...

their solution. I think that the maps that we're talking about will be applied in certain areas for localization, VPS, like we're doing right now, but more generally to provide this context. I think for a while there will be fragmentation as the market finds itself.

And then there will be a race to quality and completeness and who's the most accurate, the most up-to-date, and provides it the most effectively. And I think there, you know, like Google in 2008 and 2009, there became a winner when

One became better than the rest and stayed that way for several years. And I know that frustrated Apple quite a bit, but I think that will eventually happen here. But in the interim, there will be several providers trying to solve this problem, and they'll solve it each in slightly different ways. They'll share as much data as possible. We already have mapping data consortiums like Overture that are now working

close to open sourcing, you know, sort of important parts of the mapping data. But I think that this next set is not going to quickly go through the open source world. I think it's a harder problem because it isn't trivially solvable and it's not easily copyable either, given the amount of data involved. Yeah, this is where like what you're building and what all these other companies, you know,

you know, including Google or building is different than the large language model problem. Cause yeah, like everyone's kind of just scraping the open web and that's like a thing that you can do easily. Turns out scraping the physical world is a lot harder. Again, back to world of atoms being way harder than the world of bits. Exactly. All right. So,

Last question. So at Google, you know, one of the things I know is like you predicted that DSLR cameras would be used to create 3D models of the world. While smartphones ended up taking the lead, you seem to know what was going to happen before most did.

What are some of your current predictions for the next big technological shift? I think that a lot of capabilities are going to go on device. I think that, you know, phones, phone memory, phone processing power and battery is capable of solving a subset of what we currently use the cloud to solve. And I do think that privacy is going to be a big issue that will cause that to happen. I think that...

One of the challenges that we have with human knowledge is being highlighted by Gen AI and large language models right now, which is that the best language models we have are built on the best data they could scrape or collect or combine. And it isn't perfect. And it sometimes hallucinates because there are either holes in the data or it gets confused and it gets its wires crossed. So the

How do we get to a point where instead of oscillating around the inaccuracies of data, we start to focus our results and our answers to the correct self-checking answer? And I think building systems that can do that cross-checking, that effectively can fact-check misinformation, whether it's geographic, visual, or text on the web,

That is going to be very critical. And I think the language models at the moment are suffering both sides of it. But I think there could be a path to applying these models to detect and flag incorrect information. And if they can do that enough, we can start to build up a better data set. I mean, I think Wikipedia is the embodiment of this in some sense, is that people are editing this model of the world.

And there's enough editing and there's enough process that for the most part, most of the time, Wikipedia is correct. And if somebody had said this like 20 years ago, we wouldn't have believed it. But humans have self-corrected Wikipedia to become the sort of best source of truth.

And I think we're going to need to do that far beyond the level of Wikipedia, whether it's through science information or political information or geographic information. I think we're going to need to build tools to self-correct the mistakes in the world. I love that. Certainly you have an advantage on the geospatial side where in most cases ground truth is actually easier to find than a bunch of these other conversations. This is true.

Brian, thank you so much for joining us. It's been great to be here, Bilal. I'm very happy with our shared experience. Between the two of us, we've had such a long history with the creation of map data at Google, and I really appreciate the conversations we have together. All right. So let me tell you, that conversation with Brian really got my gears turning.

It's wild to see how Niantic flipped the script on mapping. Google, Apple, Microsoft, they all build maps from the top down. Satellites, planes, cars, you know the drill. Niantic? They're like, "Hold my Pokeball." They're building from the bottom up, turning millions of Pokemon Go players into a global mapping party. Talk about harnessing the power of games and motivated communities for real-world impact.

This isn't just changing how maps are made, it's changing what maps can become. We're moving from static snapshots that are updated yearly towards this dynamic, near real-time understanding of our world.

And these aren't just maps for humans to navigate to their coffee shop. They're the maps for machines to understand where they are in 3D space. Whether that's your phone, AR glasses, or even autonomous vehicles. What's particularly mind-blowing is how Niantic is bringing cutting-edge tech like Gaussian Splatting to the phone in your pocket. Suddenly, anyone can create photorealistic 3D captures of any space or object they care about.

It's literally like having a memory capture device in your hand. And while much of the world is focused on large language models, Niantic's focus on large geospatial models is incredibly intriguing. They're taking all these islands of spatial understanding their community has already created, all these Pokestops that have been scanned, and fusing them together, giving AI the same intuitive understanding of a place as the best GeoGuessr players.

This, to me, is the foundational substrate connecting our digital and physical worlds. Dare I say the bedrock of the metaverse.

While I have no doubt that AR will eventually replace our phones, I'm incredibly excited about the future of on-device AI. It's amazing to think that the neural radiance field technology I talked about in my 2023 TED Talk is now doable, not in a massive data center, but right there on your phone, all without your data ever leaving the device. That's a huge win for privacy and user control.

And Brian's emphasis on shared experiences, well, it really resonated with me. We're already so lost in our digital bubbles, but AR powered by these incredible 3D maps can help us reconnect with the physical world, with the people and places that matter the most to us. And let's not forget the massive potential for enterprise applications. AI powered tools that allow us to annotate places, collaborate remotely, and virtually teleport to any location.

Even with the current AR/VR headsets, the possibilities are transformative. When I take a step back, it seems the future of maps isn't just about better technology. It's about better connections. As we push the boundaries of spatial computing, Niantic is showing us that the real power lies not in building perfect 3D models or precise positioning, but in creating tools that bring us closer together.

Tools that help us rediscover the magic in our physical world. Now that's the kind of future we should be excited to help build. All right, folks, this is the last episode of season one of the TED AI show. Over the past 25 episodes, we've embarked on an amazing journey exploring a world where AI is changing everything.

From deep fakes challenging our sense of reality to the dramatic open AI board saga unfolding in real time, we've truly witnessed AI permeate every aspect of our lives. We've ventured into territories where AI is becoming deeply personal.

From AI NPCs as companions to therapy bots and mind-reading interfaces, we've examined AI's growing influence on global systems, from predicting the weather to transforming education, from UN governance frameworks to national security considerations. And perhaps most fascinatingly, we've explored AI's relationship with human creativity and consciousness.

Thank you.

As we close out this chapter of the TED AI Show, I'm excited to share that my journey with TED is evolving. I'll be moving into a guest curator role, bringing cutting edge voices in technology and AI to TED's global stage. For those curious about what's next and wanting to continue exploring these frontiers together, you can find me sharing my insights on X and LinkedIn under my name, Bilal Volsadu. Thank you for being a part of these conversations. They've been foundational for what comes next.

The TED AI Show is a part of the TED Audio Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Girard and Alex Higgins. Our editor is Banban Sheng. Our showrunner is Ivana Tucker. And our engineer is Asia Pilar Simpson. Our researcher and fact checker is Christian Aparta. Our technical director is Jacob Winnick. And our executive producer is Eliza Smith.

But don't worry, this isn't goodbye. I'll see y'all in the next one. This time, not as the host of the show, but as the guest. To everyone else, this is a task. But to you,

This is opportunity. Switch to Boost Mobile and get the Coach Prime Moto G 5G for $29.99 at BoostMobile.com. The Boost Mobile network, together with our roaming partners, covers 99% of the U.S. population. Moto G 5G for $29.99 when you switch with a new unlimited plus or unlimited premium plan activation. Online only. Taxes extra. All prices, fees, features, functionality, and offers are subject to change without notice. Visit BoostMobile.com for details.

If you're looking for flexible workouts, Peloton's got you covered. Summer runs or playoff season meditations. Whatever your vibe, Peloton has thousands of classes built to push you.

We know how life goes. New father, new routines, new locations. What matters is that you have something there to adapt with you, whether you need a challenge or rest. And Peloton has everything you need whenever you need it. Find your push. Find your power. Peloton. Visit onepeloton.com.

Security teams face a barrage of more. More security tools create more complexity. More devices need protection. More specialized focus areas create more silos. The security landscape is changing fast. How can security operations transform to meet the current threats? Cortex by Palo Alto Networks consolidates SecOps tools into an integrated platform and helps organizations stop threats at scale with AI, automation, and analytics.

Learn more at paloaltonetworks.com slash cortex.

How Pokemon Go and augmented reality are transforming how we’ll navigate the world w/ Niantic's Brian McClendon 01:09:24 Share

The TED AI Show

Deep Dive

Shownotes Transcript

How Pokemon Go and augmented reality are transforming how we’ll navigate the world w/ Niantic's Brian McClendon