We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode An Apple Intelligence Wish List

An Apple Intelligence Wish List

2025/1/19
logo of podcast AppStories

AppStories

AI Deep Dive AI Chapters Transcript
People
F
Federico Viticci
J
John Voorhees
Topics
Federico Viticci: 我希望在iOS 19中看到一个更会话式的Siri,它能像ChatGPT一样进行持续的自然语言对话。虽然苹果在AI方面落后于竞争对手,但我认为他们必须推出Siri LLM。苹果的优势在于拥有平台和生态系统,Siri和快捷指令可以访问我的应用数据,例如Obsidian和Apple Mail。我认为苹果可能会推出多个版本的Siri LLM,甚至可能将快捷指令重命名为代理。总的来说,我认为苹果的AI是重新思考我们如何使用电脑的机会。 John Voorhees: 我认为Apple Intelligence有很多值得期待的地方。Apple Intelligence整合ChatGPT的解决方案并不好,因为它没有持久性和记忆性。苹果的Siri将成为操作系统的另一个界面,系统本身也会整合AI。我希望看到一个专门的调研和写作工具整合,它能支持现有的生产力应用。我也希望看到更多系统方面转化为快捷指令的操作,尤其是在Mac上。我认为苹果有机会做得比其他公司更好。

Deep Dive

Shownotes Transcript

Translations:
中文

*Squeaky*

Hello and welcome to App Stories. I'm John Voorhees and with me is Federico Fatici. Hey, Federico. Hello, John. How are you? I'm doing really well. Happy Friday. We are recording this on a Friday. This is, you know, the behind the scenes bit of the show, I think. You know that, I don't know if I ever told you this. You guys in America, you have Friday the 13th as your unlucky day. Right. In Italy, we have Friday the 17th.

Oh, just because you want to be a little different there in Italy. Is that the thing? You want to do your own... We say in Italy, we say, Venerdì 17. It will be Friday the 17th. That's our national unlucky day. I don't really believe in these things, but a lot of people do. You see like... You actually see like a lot of people like not doing particular things on Friday. Like not leaving for a trip, for example. Not taking the bus, for example.

Not riding a bicycle. Just kind of staying home. Yeah. I don't want to do anything today. I don't really believe in that. And obviously, you know, I say that, but like, you know, knock on wood, there's still a few hours left. Yes. Yes. It's still early in the day. Well, be careful out there, Federico. We hope nothing happens.

It's like people think that black cats are a sign of misfortune. But I think black cats are beautiful. All cats are beautiful. And dogs, obviously. That being said...

We talked about Android and some specific Apple-related things in the pre-show for App Stories Plus members. We have a completely different topic in the main show, which is we're early in the year. So maybe, you know, it's time to start thinking about the things we're going to see in June. And I have a bit of an Apple intelligence wish list.

That I wanted to share. Very good. Very good. I think that there are a lot of things to wish for with Apple intelligence because we've gotten so far is, you know, just the, uh, the surface stuff. I think the big ticket things are yet to come. And I think we should add to that big ticket list.

I think we've each put together a list of things we would like to see, and we're going to connect it style round robin this. I am going to start right off the bat with the first one, which is I hope that this year in iOS...

19? Yes, 19 this year. I hope that in iOS 19, we are going to see a more conversational Siri in the sense of a credible user

ChatGPT alternative in the form of a Siri app that lets you use a large language model to have ongoing persistent conversations with Siri, just like you can have conversations in natural language with ChatGPT. I hope Apple will showcase, and I don't think they're going to release it this year. I think it's going to be a 2026 thing. But I hope we're going to see this sort of experience. I think this is complicated because...

Because on one hand, I think it's going to be a tall order for Apple to even match the functionality of ChatGPT from two years ago. We know that, you know, it's pretty much established that Apple intelligence is roughly two years behind the competition. You know, you look at ChatGPT, you look at Gemini, you look at

Claude. And it's very clear that the Apple intelligence that we have today is basically where the rest of the industry was in 2023, arguably even 2022. Right. And right now, none of it at all is baked into Siri because just yesterday I asked Siri, what is, what's the weather going to be in two hours? And instead of telling me what the weather was going to be in two hours, it told me what the weather was going to be at 2 PM, which was three hours before the time I was asking. So why not? Why not?

Why not? Why not? And so I'm going to lump this together with the next item in my list, which obviously this sort of experience that I imagine has been rumored by Mark Gurman at Bloomberg. The idea of a Siri LLM, the idea of a Siri large language model that allows you to chat and to have longer ongoing conversations, persistent conversations in natural language with Siri. I think...

Even though I said it's going to be a tall order for Apple to do it, the thing is, I believe Apple has to do it. You look at ChatGPT, you look at the App Store, ChatGPT is consistently in the top five most downloaded free apps on the App Store. Right. And I'm going to wager that it's the case everywhere. I look at the US App Store, I think it's the same everywhere. ChatGPT, the ChatGPT app has become this

wildly popular global phenomenon that I see on everybody's iPhones these days. Yeah, and the fact of the matter is that the fact that Apple Intelligence incorporates ChatGPT isn't a good solution right now because it doesn't have the same kind of persistence and memory and continuity that you're talking about that the app has.

And it's not frictionless, right? It's built for these sort of like ephemeral requests where you ask one thing and Siri realizes, oh no, I don't know how to answer this. Do you want to go to chat GPT? And you go to chat GPT, but it's not the full experience. And at that point...

I think you're better served by actually opening the chat GPT app on your phone and having a proper conversation there with more features and faster speeds. So this has a lot of potential ramifications. The first one, obviously, if Apple is doing a large language model,

What's it going to look like? Look at the rest of the AI industry. Look at OpenAI, look at Google, look at the open source scene, look at Lama, look at Quan coming from China. All these companies are releasing either closed source or open source models, and they release a collection of models usually. Models are, I'm going to simplify here,

contain a certain number of parameters and they support a variety of context windows. The context window being the amount of text that can be stored in the same conversation with a large language model. OpenAI, Google, Anthropic, they have different flavors of

of their models, of ChagGPT, Gemini, and Cloud. There's usually a bigger model that contains a lot of parameters, has been trained on a lot of parameters, a lot of text, and supports a very large context window. Famously, Google with Gemini is the only company to offer, I think at the moment, a one million token context window, which is very, very much higher compared to ChagGPT and Cloud. Now,

I'm thinking, if Apple is going to do, so let's say Apple is going to come out and say, we have a Siri LLM. First of all,

Are they going to follow the rest of the industry in having multiple versions of Siri LLM? Is there going to be... Because I could see a scenario in which Apple, following their own well-established and copied, if you look at Dell, naming pattern. I could see a scenario in which Apple does Siri LLM Mini, Siri LLM, Siri LLM Pro, and Siri LLM Pro Max. Yeah, maybe. Maybe. I...

I think what they're going to do is they will have multiple models. They're just not going to tell you about it because I think... Ah, interesting. Because it'll be simpler. It's a marketing thing. The mini one will be on your phone. It'll be local and it'll be used whenever possible to get you answers really quickly. And then the bigger ones, it'll go out to the web and it'll go out to that cloud compute that Apple's implementing. Yes, I think you're right. I think you're right. I think we could see a scenario in which a, I don't know, a 1.5...

uh, 1.5 gigabyte model. The mini model is stored offline on your phone. Right. And, uh, that could be serial and mini. It works offline and it handles basic requests. But for other bigger models, it goes off to private cloud computers. And so it requires an internet connection. Uh,

I think we're years away from this sort of model. I think it's going to be a 2026 thing. That's not even to mention a reasoning model, which I think if Apple is ever going to have a reasoning model, that's going to be like at the very least like a 2027 thing. Right. But,

Additional questions. Are they going to support multiple modalities? By modality, I mean the type of input, the type of interaction that you want to have with a serial LM. Are you going to be able to talk to it in real time like you can with Gemini or ChagiPT voice? Are they going to support text? It seems pretty much a given. Are they going to support image attachments? Are they going to support local document attachments? And the other big question that I have,

The obvious advantage that Apple has compared to the competition is the fact that they own the platform. They own the ecosystem. They own the devices that we use. And that means that Siri and shortcuts, all these features, Apple intelligence as a whole, has access to your apps, to data from your apps.

The thing that ChatGPT cannot do is look into my reminders, look into my Apple Notes, look into my Obsidian. Theoretically, that's something that Apple Intelligence could do. That's something that a Siri large language model could do. And when I think about this and I think about the potential scenario in which Apple is coming out with the Siri LLM, I get very excited about the idea of chatting with Siri about the contents of my Obsidian or talking to Siri about the contents of Apple Mail.

Yeah, look, I think that where we're heading with this is that Apple, Siri for Apple is going to become another interface for the OS itself. That we are going to have apps, there will be apps that will incorporate heavily Apple.

you know, artificial intelligence, but the system itself is going to incorporate it too. So it's not just going to be go to your Siri AI app. It'll be go to maps and ask it a question that is more, you know, the kind of question you'd ask to an LLM and find out information about locations around you. Or you do the same in notes or reminders or whatever app it happens to be. And this kind of feeds into my first thing, which is,

I want to see a dedicated research and writing tool integration. Now I could see this being a separate app, a little bit like Notebook LM that Google has. Something along those lines where you can, you know, multimodally add documents and audio and other things to a collection of things in a project and then use it to mine it for information for research.

But I could also see it just becoming like a sidebar in something like in something like pages, for instance, or freeform, something like that, where where, you know, you've got other information that you're creating, whether it's text or.

or a mind map, or maybe a spreadsheet in numbers, and you want to incorporate other information where a large language model would be useful. So I think as a research tool that supports the existing productivity apps, that could be really powerful. Yeah, as you were mentioning this, I thought the idea of potentially

Sort of like being able to have a super brain where you could like reference almost like deep link specific points of multiple conversations that you've had with the serial LM. Like, let's say you're working on a project. Right. That's a good point.

You're working on a project, you're working on your next macOS review, and you want to reference, maybe at some point you had a conversation with Siri about what are the different names that Apple use for macOS versions? And then at another point, you ask, hey, when did Apple add the new design that was macOS Yosemite? But you have these multiple conversations, and being able to reference those points of multiple conversations in the same project, that's something that...

That's something like that, that kind of like instant and total recall. It's something that the human brain doesn't really support unless like, I mean, do you remember John exactly how every single conversation in your life has gone? No, no, obviously not. Right. You remember the gist of it. You remember maybe some of the ideas, but obviously we're not, the human brain is not, doesn't scale like that. Right. So the ability to have that sort of research tool, uh,

based on your data, I think it's like the ultimate assistive AI tool. Yeah, I think it is. And as a result, you'd also want to have it built into Safari as well. I mean, just the other day, I was writing a story for the club all about...

the camera and what the rumor that the 48 there's going to be a 48 megapixel camera sensor in the iPhone SE and I was trying to compare what different sensors were incorporated into what Apple devices on what dates and it's all over the map whether you're talking about the iPad or the Mac's FaceTime camera or the iPhones rear or front-facing cameras there's a ton of camera sensors and all those devices and

And plotting when the 48 megapixel part came into view and where things are heading on each of those platforms is something that's just like not in my brain because there's too many different factors and different models to keep track

keep, you know, remembered in one place. So I was using ChatGPT for that and it was perfect for that because then I could get the links and I would just go straight to the Apple tech specs to confirm that it wasn't making anything up. But it was just a really shorthand way and very quick way to research as opposed to like scrolling through page after page of tech specs

finding the tech specs pages in the first place and then going through them and comparing them side by side I could very quickly get a summary click through to those links verify everything was right and move on with my writing it was much faster than traditional web search that way yeah yeah it's like I think um

If Apple just wants to copy ChaiGPT and have something that you can talk to and ask questions about web sources or general knowledge, I think that's fine. But I think the real potential is something along the lines of a mix between ChaiGPT and what Microsoft is doing with their Rewind feature, which it's sort of...

available now. It was like off to this pretty much terrible launch where it was like a privacy nightmare. And then Microsoft sort of went back to the drawing board for that. But the idea of

Being able to privately and securely access data on your computers and interact with apps, because that's the other thing. I really want to see what Apple does with these app intents, with this ability to ask Siri to perform actions in your third-party apps without having to put together a shortcut first. I'm going to get to shortcut in a minute. But I...

I get very excited about this idea of Apple actually catching up with the competition and actually eclipsing the competition in a way that only they and maybe Google can do because obviously Google owns Android. And I could see Google sort of beating Apple to the punch at Google I.O. with similar features in May, like having a framework to actually plug Gemini into all of your Android apps. And Chrome OS, too, for that matter. And Chrome OS.

Yeah, I could see Google. And I mean, Google has pretty much said we're going to do agents in Google Chrome. But yeah, I think it's an exciting space to watch this year. And I hope we go beyond, you know, we saw image playgrounds and photos cleanup, image wand, like this...

pretty rudimentary tools with the first wave of Apple intelligence features. I hope we go beyond that. Yeah, I hope it goes beyond voice too because I think voice is an important aspect of this and maybe this gets a little bit to what you're going to talk about with shortcuts, but I think that there is a place for one-off actions asking Siri to do things on your behalf with your apps and your data on your computer or your iPhone or wherever. But I also think

Siri and Siri as an AI could be a big help in actually creating reusable automations, you know, that are multi-step things that you don't want to have to ask it to do all over again. It actually, I mean, because that's something I've been doing with ChatGPT is it's a really good tool for helping work your way through a shortcut ID.

you know, giving you, telling you what the actions are. It does hallucinate a fair amount with shortcuts, I think, because it's a kind of a visual app

But I think that's something that Apple could do really well with shortcuts is use it as a tool for helping people, walking people through the steps of finding the actions, whether they're the built-in ones or whether they're third-party ones that are available, either from apps that someone's already installed or maybe even ones that it knows about because it has the entire database of the App Store out there and what's available in terms of shortcut actions to give people suggestions about

about apps they might want to use to support their automations. Yeah. I think I mentioned this before. I am convinced, this is just my personal theory, that the spin that Apple is going to put on the AI agents that everybody's doing now, Apple's spin is going to be shortcuts. I'm convinced that at some point we will be able to create an agent that is basically a fancy new kind of...

Personal automation, essentially. The whole idea of agents is to use a natural language and a large language model to have it perform

something in the background for you. And we're seeing these kinds of agents in IDEs and in web browsers where you tell the agent to do one thing, like book me a table at this restaurant. And it's literally a large language model using image recognition and UI scripting to click around the web browser and do things for you and then ask you for confirmation.

I think Apple has the potential here to actually do native actions because they own the app and the framework that is shortcuts and app intents to perform tasks, to do stuff in your favorite apps. But the interface is clunky and personal automations are even clunkier and they're not well supported across platforms. There are still iOS and iPadOS only and they do not sync.

I could see a scenario in which Apple says, "You can create agents with Apple Intelligence. They're based on shortcuts.

I could see Apple renaming shortcuts to agents even in their quest to catch up and to put up a good face and be like, we do AI. I could see Apple renaming shortcuts to agents. How about Apple agents right there? We got Apple intelligence, we have Apple agents, right? We got Apple agents.

uh, uh, I could see that sort of feature becoming something based on shortcuts and based on app intense, where you say, um, I want an agent that, uh, obviously this is going to require a lot more app intense and a lot of more triggers and features that do not exist in shortcuts. Now, right. For example, an agent that says, uh, whenever I get an email from John, uh,

Mark it as important. Or whenever I save a PDF to this folder, rename it with the current day. Like something along those lines, like an agent for, and you can apply this to photos, to Safari, to mail, to calendar, to whatever you want. Anything that integrates with shortcuts, I think is a prime candidate for becoming a little agent that lives on your devices, that is powered by Apple intelligence and powered by a Siri large language model.

Yeah, I think that's a great vision. I think that I'd love to see that too. The problem I have right now is that there are so many aspects of the system that have not been turned into actions already, especially when you look at the Mac. I mean, these agents are going to be powerful everywhere, but I feel like shortcuts really is still stuck in the mud when it comes to Mac OS. Yeah, yeah.

So the second part of my agentic wish list is I hope that we see more and more from Apple intelligence in terms of

multitasking and productivity. It's nice to have conversations. It's nice to have photo-related AI features. I think there's a lot of potential. And again, Apple is well-suited for this sort of thing because they make computers. OpenAI doesn't make computers. Google does. And so I think Google is actually Apple's main competitor in this space. But for example,

Allow me to spend less time managing my files, renaming files, organizing them in folders. Allow me to use a larger language model to, I don't know, assemble a workspace for me. And yes, I know nerds will always want the ability to manually arrange their windows, move them around. You know, John Syracuse is going to lose his mind. But like I know, and I'm probably still going to want to like carefully arrange my windows in Stage Manager. But the idea of like say,

assistant, when I sit down to work, open the windows to let me browse the web and take a note and also show me the calendar on the side. Like instead of having to recreate that workspace every single time. Essentially what I think Apple is going to do, I think this whole Apple intelligence thing and this all sort of AI catch up session that Apple is going to have for the next few years is

I think is a new kind of fuel that will allow Apple to think of new features across the system. Past few years, if you think about it, what have we really seen in iOS and iPadOS that's

truly new beyond customization. Like there's only so much you can customize. Now we have wallpapers, we have lock screens. Most of the actual OS innovation has happened in customization for the past few years. I think this whole thing with AI is an opportunity for Apple to fundamentally rethink how we operate with our computers. And so I hope they go beyond Genmoji and Qt C,

horrible for me images and actually embrace productivity, how people are actually working with their computers.

Yeah, I agree with you on that. I mean, here's the thing. I mean, I know you say that maybe the nerds will want to arrange things just so, but I think that AI can help there too, because what you should be able to do is arrange that screen exactly the way you want it and then tell your Siri AI, this is the way I want to work in Safari and Obsidian every time. Create, you know,

save that for me and then be able to launch it and repeat it and have it repeatable every single time. That's essentially what people do with things like Keyboard Maestro and, you know, Better Touch Tool on the Mac, for instance. So I think that that makes a lot of sense. I also feel like

there's a lot of opportunity for working with the apps and the data that are already on your devices. But I'd also like to see this AI be able to work with data that's out on the internet. So I want to be able to say, you know, I want to go to a website and see a giant table of information and say, you know, Siri, create a spreadsheet for me using the data in this table on this website. You know, that kind of thing, whether it's, you know,

information that's displayed on a web page, whether it's a file that's linked on a web page, whatever it happens to be, be able to do that and create that stuff for me or tell it, go to this website where there's API documentation and explain to me how I can integrate this with shortcuts. Because I think that those kind of things are going to be really powerful. Plus,

There's a developer angle here too, which is I know that a lot of developers I've talked to aren't super keen on Swift Assist, which is the LLM, you know, auto-completion of Swift code that has been added to Xcode. And so I think there's a lot of work that can be done there because we're seeing apps like Cursor, which is incredibly powerful and also very popular,

So I think, you know, just as Apple is a little bit behind on the large language models in general, it's also behind in kind of the code aspect of this. And that is important to keep developers happy. So I'd like to see that moving forward quickly too. Yeah, developer tools are changing. It's just a fact. You just look at what people are doing with Cursor. It's wild. Speaking of developers, here's a question for you.

Is there going to be a Siri LLM API? Because I would think, like right now, there really isn't an Apple Intelligence API. I would think that at the very least, there should be a native API for apps to handle processing of data. And, you know, just like look at how many iOS apps you see that have a chat GPT integration that maybe lets you rephrase text or in an automatic fashion. Right.

I think there should be a Siri LLM API. And here's a follow-up question. Is there going to be a web API? Are you going to be able to make web requests to a Siri LLM or not? Yeah, that would be interesting. I think there are a couple aspects of this that I could see. One is...

is structuring your data for your apps in a way that is optimized for a Siri LLM. Just giving it a little bit of a head start as to what kind of data it is and how it can, various ways it can be used with different actions that are available to the LLM, whether it's through shortcuts or

app intents or whatever. That's part of it. That's basically what they've done with the app intents schemas and domains that they have. Right. But I think that that needs to expand because right now we're at maybe a dozen or so categories. I think 12 categories they have, yeah. Yeah, so that needs to expand. But you're right. It's the kind of thing where it's not just about structuring the data in the apps. It's a question of whether third parties can use

Siri itself as a way to optimize things or use it inside their app. Because, you know, App Intents are more about sharing between things. This would be more an API for acting on the data that's already in your app. So, I mean, we'll see. We'll see. Yeah. My final thing, it's just, it's a more simple thing.

I hope we see some kind of Pixelmator or Photomator-inspired functionality find its way to the Photos app with some new AI features, like the super resolution features that they have, the machine learning adjustments that they have in Pixelmator and Photomator. I think those would make for pretty nice additions to Photos.

I really want to see what Apple does with those apps that they now own this year. Yeah. Yeah. I think photos has some of that built into it already, but you're right. I think Pixelmator goes, Pixelmator Pro in particular goes a step further when it comes to things like resolution and, and some of the other tools that are baked into there. So it'll be interesting to see how that gets incorporated in the, you know, there are things that Apple could do. Maybe, I don't know, expand how tagging works. It's,

It's different kinds of automation to take kind of the tedium out of managing a photo library, basically. Yeah. So that's my wishlist. And, you know, watch as none of this happens and Apple instead does image playgrounds too. So.

Yeah, we'll see. We'll see. I mean, there's a lot to be done. And I think you're right. I think it's to set expectations. I think this is not just a 2025 thing. I think that you and I will be talking about this for the next two, three years and beyond even from there. So it's not slowing down. And hopefully Apple can kind of both catch up.

but also innovate itself and do some things that some of these other companies haven't done yet. I think it's time, especially, and I think Apple is particularly well poised to break free from the chatbot. I think a chatbot has a place in these systems, but

But I don't think the chatbot is necessarily the solution for all UI for an LLM. And with access, obviously, to the OSs themselves, Apple has an opportunity to do that better than, I think, any of these other companies. There's an idea, a foldable iPhone that opens itself thanks to AI. Okay, Federico.

It launches Vivaldi in the browser. It opens and closes with AI. It's a robot iPhone. Then you tell it to install Android on itself and become a Pixel phone. I don't know.

All right. We better end it. We better end it there, Federico, I think. You can find both of us over at MacStories.net. We are at Club Mac Stories, too. That is our subscription service. Club members get all kinds of stuff. They get weekly, monthly newsletters. Depending on your...

your tier. You can join our club discord. We do special things with our podcast, Mac stories unwind plus, as well as the plus version of this show, which comes out early and is extended and ad free. So there's lots of great perks there. Check out all the details at plus dot club. And of course you can find us on our other shows, Mac stories unwind and NPC next portable console. Talk to you next week, Federico. Ciao, John.