We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

New from Stability AI: Generative Music Model

2025/5/27

ChatGPT: OpenAI, Sam Altman, AI, Joe Rogan, Artificial Intelligence, Practical AI

AI Deep Dive AI Chapters Transcript

People

无

无名氏

Topics

无名氏: Stability AI最近发布了一个新的音频生成模型，专注于音乐创作而非人声。这个模型旨在避免版权问题，通过使用完全拥有版权的素材进行训练。与Suno和Udio等竞争对手相比，Stability AI的模型体积更小，可以在手机上运行，但生成质量相对较低。尽管如此，该模型在生成短音频样本和音效方面表现出色，并且完全没有版权风险。然而，它也存在一些局限性，例如只能使用英语提示，生成的音乐风格有限，且对商业用途有一定的限制。总的来说，我认为Stability AI的这个新模型在版权合规方面做出了积极尝试，但在技术层面仍有提升空间，尤其是在生成高质量、多样化的音乐方面。我个人更倾向于使用Suno或Udio，因为它们在音乐生成质量上更胜一筹，尽管存在版权争议的风险。

Deep Dive

Chapters

This chapter explores Stability AI's latest music generation model, its features, and how it compares to competitors like Suno and Udio. The discussion focuses on the model's size, speed, and copyright-free nature.

Stability AI released a new music generation model.
The model is lightweight and runs on smartphones.
It uses royalty-free audio libraries to avoid copyright issues.
While not as high-quality as competitors, it's faster and more accessible.

Shownotes Transcript

Translations:

中文

Today on the podcast, we're going to be talking about stability AI and a brand new feature that has just rolled out. And that is the ability for them to do audio. So this is a new update that they've rolled out recently. And stability is kind of an interesting company. You'll probably remember it just for the fact that it was one of like the leaders in the AI revolution. They literally invented a stable diffusion and the way that we use AI to generate images.

And yet they really got left behind as a company that's had a lot of financial issues. But I think that they're about to make a big turnaround. And so because of this, I don't think it's a company that you should count out just quite yet. The one thing I did want to mention before we get into this, if you haven't tried it already, my startup, AIbox.ai is officially out of business.

It is officially launched and our first product is the AI Box Playground. We have a beta out right now that essentially allows you to access the top 20 AI models all on one platform. You can chat with them all in the same chat. We have audio, image, and text all in the same chat for $20 a month. So you don't have to have subscriptions to 20 different platforms. You pay one time for that and then you get access to all the different platforms. So you can check it out. The link's in the description, AIbox.ai.

All right, let's get into what's happening with stability AI. So the new update they have, the thing that's really interesting about it beyond the fact that, you know, they came out with kind of like an audio model and I should preface this by saying they have a big announcement about an audio model, but this isn't like a vocal model. This is a music model. So specifically it does music. There's a bunch of different competitors. There's Suno and Udio that are doing this, but most of these ones that are kind of doing this generated music,

People criticize them for the copyright. So they're like, look, these guys, they grabbed all of this data from the internet. They grabbed everyone's music. They trained a model and now it creates music. So people are upset about kind of the copyright in the data set for this. Stability tried to avoid this essentially. And they did a couple of cool things. Number one, it's a really lightweight, small model that actually can run on your phone. Meaning like Suno and Udio have apps that can run on your phone, but obviously that's going up to the server, to the cloud and running off of, you know, their own, their

their own websites and servers and stuff, you have to have access to the internet. With this application, you technically could just do everything on your phone. Your phone is powerful enough to run this model and it can generate you stuff. Now, I will put a caveat on this by saying this is not as good as Suno or Udio. That's just the nature of the beast. So

stability trained this only on content that they had copyright for, which is fantastic, right? They don't want any sort of IP risk involved with this when they're releasing it. So they said that it's entirely made out of royalty-free audio libraries and the free music archive and free sounds. Those are kind of their sources and they're allowed to do this, which is

technically great, except that it's not as good. So that's, I think the big thing. It is really small. It's 341 million parameters in size, and it was specifically optimized to run on ARM CPUs. So ARM makes chips. These are built on, you know, this model was essentially built so that it's able to run on an ARM CPU right on a phone. These ARM CPUs are often put into phones. So the thing that it's specifically made for doing though, is for quick

kind of shorter audio samples and sound effects. So you can do drums, you can do instruments, you can do riffs, and it can make up to 11 seconds of audio. You can do it on a smartphone and it takes about eight seconds to do this. So this is, you know, definitely faster than your average UDO or Suno AI piece, but

But, and I'm not saying it's bad, actually, I think it's fairly decent for what it can do, but like it doesn't do vocals. And so if you're trying to make a fully fledged song or honestly a really great song, like Suno and Yudio are going to do a much better job, in my opinion, of making music. I've tried both of the, I've extensively tried Suno and it does incredible work, makes amazing music. People,

People criticized that it was trained off of the copyrighted data. I'm not too concerned about that. That's not really my problem. I'm sure people get mad at me or criticize me for that, but that's just my opinion is just like,

That's their copyright issue to deal with the model so much better as a user and a consumer and someone that would like to create things, I'm gonna use the best model. So that's kind of what I'm getting out of Sunora UDO. All right, I wanted to give you a sample though, because I'm actually quite impressed by what they have been able to prove. It's completely copyright free. There's no issues there. So they have a couple samples of what it's able to actually. So you can actually go online, check out SoundCloud. They got a bunch of different samples.

And all of their samples are like much shorter, but they are, you know, showing you exactly what it's capable of doing. They could do some drums, some music. They have a bunch of limitations in addition to the all ones I've mentioned already. One, it can only do English prompts written in English. So if you speak another language, you'd have to translate your prompts into English and Google Translate or something like that.

It can't generate realistic vocals or high quality songs. It's kind of low quality. And it doesn't do a lot of different musical styles. It was really just built on a bunch of kind of Western, they call them Western biased training data. So these free music libraries are not very extensive. It's just mostly kind of like Western music.

So, it also has a little bit of restrictive usage. It's not the end of the world. You got to make money somewhere. So, it's free for researchers and hobbyists and businesses that make less than a million dollars annual revenue. But if you're making over a million dollars, you have to pay Stability's enterprise license. This isn't the end of the world. And I think this is a pretty standard licensing kind of deal. Although, yeah, it feels like they'd be making something open source. So, I guess some people...

book are upset about that. Now, stable diffusion is a company that has had a ton of issues in the past. Um, they've raised some new money last year. Uh, a bunch of their investors, including Eric Schmidt from Google, um, the Napster founder, Sean Parker famously who, you know, invested in, uh, meta. We're really trying to turn the business around. So Emod, uh, most stack was their, uh,

co-founder and he was kind of the former CEO. He apparently really mismanaged all of their finances, almost completely destroyed the company. Tons of staff resigned. There was a partnership they had with Canva that fell through. Investors were super concerned about this. So in the last few months, they actually got a new CEO and they appointed James Cameron to their board of directors, which is interesting because typically this has kind of been famous as a image company.

and with James Cameron, you can kind of imagine where they're going with this is going to become a video company. All these AI generated images are perfectly poised to create AI generated videos. And they've also released a bunch of new image generation models. So it seems like stability is on track to do some cool things. I think specifically, if we're looking at video doing these sound effects and kind of these like smaller music bits makes a lot of sense. They want this in the background of if, you know, they're making music tracks,

to be able to, or sorry, videos, it'd be really cool to have also AI generated music in the background. So this makes a lot of sense with kind of their strategic direction. I'll be super curious to see where they go. This is a very prolific company. It's raised a lot of money. It's done a lot of interesting things, but again, it has faced a lot of challenges. So I'll keep you up to date on everything happening with stability. Make sure to leave a rating and review wherever you listen to your podcasts. And again, if you haven't tried AI Box already, there's a link in the description. I would love to have you try it. You can dump a ton of your subscriptions.

For $20 a month, you get access to all the top AI models. You can compare results side by side of different models. You can chat with all of the models in the same chat. You don't have to switch or not have the ability to keep talking to different models. And it's a lot of fun. So check it out, AIbox.ai, and I will catch you next time.

New from Stability AI: Generative Music Model 07:52 Share

ChatGPT: OpenAI, Sam Altman, AI, Joe Rogan, Artificial Intelligence, Practical AI

Deep Dive

Shownotes Transcript

New from Stability AI: Generative Music Model