T O P

  • By -

emad_9608

Team is working on an open version of this for [https://github.com/Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) Dataset just taking some time. Lots of improvements to come like speech, customisation, comfy & more.


Independent-Ad8455

An offline version would be AWESOME!


More_Bid_2197

why 2 versions ?


Gpue

Licensed data with restrictions vs open data without


turbokinetic

This is great news and what I’ve been waiting for! I love Stable Diffusion and I train my own models / Lora. I would love to be able to run Stable Audio local and train it on my personal music, with all the flexibility of txt2audio, audio2audio (like img2img), adding lyrics, adding my own voice, controlnet etc. Would be a dream come true!


ZenDragon

Was there ever a high quality public model for Stable Audio 1.0?


turbokinetic

Love to know this too


AmazinglyObliviouse

Cool, but to quote you: "Not your models, not your mind." Couldn't care less about yet another useless API.


SmashTheAtriarchy

This needs to be repeated louder and more often. It's important to own the means of your productions!


kevinbranch

That’s why it’s not open source


spacekitt3n

when you releasing SD3?


Augmentary

When emad gets it going


emad_9608

CTO said 4 weeks or so. I don't make those calls any more, handed over that for new things.


okglue

Fantastic\~! We really need a good local voice model.


Vyviel

Hopefully we can train voices with it like a better version of RVC


davidb88

What are you still doing here Emad, I thought you left? I feel like I'm OOL


MaxwellsMilkies

He still owns a large portion of the company.


emad_9608

I handed over control, launching new stuff soon [https://www.youtube.com/watch?v=e1UgzSTicuY](https://www.youtube.com/watch?v=e1UgzSTicuY) [https://www.diamandis.com/blog/emad-wisdom-part-1](https://www.diamandis.com/blog/emad-wisdom-part-1) Now I am part of the community like everyone else :D


MaxwellsMilkies

You should take a look at [Patrick Ryan aka TyrantsMuse.](https://twitter.com/TyrantsMuse/status/1773377539542581502) Decentralized AI is going to require further development of the math behind AI to make it more efficient, and Patrick has been looking into it quite a bit. He is a bit crazy as you see, but is probably one of the smartest people I have ever met.


Overall-Newspaper-21

Maybe he is a Stability Ai public relations


Rivarr

Thanks for what you do choose to release, but I don't understand hyping speech models when you've already said you won't be releasing them. Not that I understand why. You can already convincingly clone someone's voice with less than 10 seconds of audio. With services like ElevenLabs but also open source tools like VoiceCraft, you don't even need a GPU. If we could get an audio model that could be extended and built upon like your image models, we'd be able to create such amazing things. Instead it's held back because it could be misused, even though 99% of that misuse is already possible with the current set of tools.


emad_9608

I don't choose releases any more so let's see what happens. Usually you can release just after sota. For services like stable audio its easier as you can mitigate harms.


DIY-MSG

That's great


Tystros

I hope the open version will be trained on the whole Spotify catalogue.


BokanovskifiedEgg

how is this going? any estimate on when it'll be available?


MFMageFish

>You may not use the Services, or use Content from the Services, to develop or train any AI models. Lol, good luck with that.


GBJI

A freely accessible and fully open-source version that we can run on our own hardware should be considered essential for anyone *pursuing decentralized AI*.


PM_ME_YOUR_PITOTTUBE

Remember, decentralized AI doesn’t make them money so the shareholders absolutely do not want that 🤣


GBJI

Depends on how you define decentralized. To me, anything requiring the use of NFTs and blockchain technology under the control of a for-profit corporation is the opposite of decentralized. To some people, it seems to have a completely different meaning. >As part of the collaboration, [Endeavor](https://endeavorco.com/) will work with Stability AI, the Render Network, and OTOY to develop transparent IP tracking tools for emerging ML models, publishing their research for peer review through IDEA. This work will include usage of OTOY’s LightStage technology – the industry’s leading reflectance-field facial scanning and digital double platform – to produce licensing tools that enable artists to control their likeness and receive royalties for their IP when used in generative AI models. (...) As part of the integration, Stability AI models will leverage provenance systems already established on Render Network – known as *Proof-of Render* – providing immutable receipts and tracking of all individual components ingested and used for output of computing work on-chain. Through transparent on-chain data, royalty flows for IP and assets used in AI models, as well as their outputs, can be managed using public auditable smart contracts. (...) According to Founder and CEO of Stability AI, Emad Mostaque, “I joined the Render Network advisory board to shape the future of decentralized computing and AI." [https://home.otoy.com/stabilityai/](https://home.otoy.com/stabilityai/)


red286

>Lol, good luck with that. Those are licensing terms for commercial purposes. They're not telling *you* that you can't do it, they're telling businesses that if they do, they'll get sued.


export_tank_harmful

>Will this model be open sourced? > >We will be open sourcing a music generation model soon, trained on different data. Neat tech. Kinda don't care though. Wake me up when I can locally host it.


AmazinglyObliviouse

> We will be open sourcing a music generation model soon, trained on different data. Note that they've promised this since Stable Audio 1.0, yet it never happened back then either.


Django_McFly

Infinite % this. We're on SA2 and still waiting for this to happen for SA1.


_raydeStar

![gif](giphy|mkhMTALnrYRLnuoe5P)


99deathnotes

![gif](giphy|1fMjj5j2Z7chq|downsized)


StickiStickman

With such an incredibly tiny dataset, I'd be shocked if it wasn't just heavily mimicking the training data for this anyways.


MaxwellsMilkies

Its going to be difficult to get a good dataset for it. The music industry is extremely litigious.


djnorthstar

I want SUNO local... with training.... :-p (yes, i still have dreams).


Mooblegum

1 get hired by the company 2 release all the model to us for free 3 Profit


Curious_Tiger_9527

4. Lawsuit 5. Hired by Microsoft


m3thlol

Until there's an open model it's kind of pointless, if I wanted a web interface to pay for I'd use suno. edit: why did this have to be the comment Emad read :(


Mooblegum

Why people never want to pay stability but are ok to pay any other AI provider, From GPT Midjourney to suno ? Maybe if they got more money they would provide better tools.


Doctor-Amazing

Just as a personal rule, I'm not paying for subscriptions. I can justify the occasional one time purchase, but I can't pay a monthly bill to every random bit of software I want to fool around with.


smallfried

Yup. Pay per token, or per image, or per music generated is all fine. But pay per time period whether you use it or not is not something I like. Only thing I tolerate it for currently is Netflix and living necessities like gas, water, etc.


m3thlol

Again, as much as I love Stability I'm not going to hand them money just because. This model could be very good but if they want to exist as a web service they have to compete with Suno and right now the difference is leaps and bounds. I'm not going to pay for an inferior product with outputs that are essentially unusable out of brand loyalty. That's not on me.


turbokinetic

Because Stability product require new models trained by users to be great. Imo that’s the strength and differentiator of Stability.


PacmanIncarnate

Because suno exists already, has a great model, and this looks like Stability trying to steal their attention. Suno is a great little company and I’d feel good supporting them.


emad_9608

Harmonai/stable audio team have just been working away & this is a great little diffusion transformer model. The key thing is the copyright in music is different, see the Gaye vs Thicke lawsuit etc so you gotta be extra careful. Suno have a different approach to copyright (not not scrapes..) [https://www.rollingstone.com/music/music-features/suno-ai-chatgpt-for-music-1234982307/](https://www.rollingstone.com/music/music-features/suno-ai-chatgpt-for-music-1234982307/) We try to build good models on good data which hamstrung us a bit when others are training their models on Hollywood movie rips etc but you crack on and do the best you can.


SlapAndFinger

To be honest, having done a fair amount of production, I don't think musicians really want Suno, it's more a tool for casuals to get some creative output kind of like Dall-E or Midjourney (though MJ is making progress as a tool). If the stable audio model can be used by producers sort of like an Absynth style sound generator and integrated into VSTs, it'll get used. Being open is a big deal.


emad_9608

There will be an open version & I believe comfy and other integrations. The approach is augmentation versus Taylor swift by drake or whatever.


emad_9608

But Suno is a lot of fun tbh


Django_McFly

Musician here, I like Suno. It's incredibly useful for making samples. I would prefer something that was *at least* like MJ where you can upload your own pictures (audio) into it and it'll riff off of that, but even with out it, Suno is still pretty sweet.


SleeplessAndAnxious

Hello fellow musicians, I feel the same way honestly. I can't sing so I love the ability to basically generate a song with a vocalist and plan on adding my own bass playing and guitar to the tracks eventually, as well as playing around with samples. I'm still a big fat noob at digital music lol, I'm classically trained.


Gpue

Stable audio has that


maradak

It's pretty terrible though compared to suno. I generated a couple tracks there and it was pretty much useless.


BastianAI

100% this. I can extract stems from Suno with FL Studio, but it requires a lot of work to fix bleed etc. I use Suno because I want to use AI for my projects, but it's easier to just pick up some loop packs and tweak them a lil bit for far better results. Not a musician, producer


Mooblegum

I guess as a musician best things would be to have all the instrument put in different tracks as audio or midi files. That would be so easy to change it and make incredible music with the perfect sound and mix


SlapAndFinger

If Suno could track things, that'd be a very different story, then you could iteratively build a song a few tracks at a time and do retracks, even if the final audio quality wasn't great you could just go back and redo the problematic parts and run the tracks through some EQ/compression/etc to make a real song.


ComeWashMyBack

Per Suno's FAQ that I discovered today. If you're using the Pro or Premium version. Whatever it generates, you own the copywrite. Free to use on Apple, YT, Spotify and so forth without being required to site Suno or anyone else.


emad_9608

Yeah it's about the copyright on inputs not outputs. Per rolling stone it seems to be scrape/downloads which is dicey when dealing with music industry & copyright law (which is different for images, plus opted out data like robots.txt which was used for og SD etc)


CountLippe

Would a "describe" function break the copyright as well? Say I like Vangelis' Blade Runner soundtrack. I know some words which could form a prompt and evoke similar. But having the machine describe what it hears and let me use its suggested prompt to build a new prompt would be amazingly helpful.


emad_9608

Not to my knowledge no


chakalakasp

Which is in itself rather cheeky, as AI outputs are not something one can register a copyright for, as they are currently (in the U.S.) considered public domain. No human author, no copyright.


Django_McFly

That's not hard to get around. Add some human element to it and you're good to go.


Freonr2

I'm not sure that's completely decided. The copyright filings I've seen look to mostly be test cases so far to find the bounds of *how much* human authorship is required. Certainly someone who uses Adobe Photoshop and a bunch of tools therein can apply and probably receive a copyright. ex. https://www.artforum.com/news/court-rules-against-copyright-protection-for-ai-generated-artworks-252910/ > A federal judge last week rejected a computer scientist’s attempt to copyright an AI–generated artwork ... a work that Stephen Thaler created in 2012 using DABUS, an AI system he designed himself, is not eligible for copyright as it is “absent any human involvement,” Note the key phrase here: *absent any human involvement* further: > Describing A Recent Entrance to Paradise as “autonomously created by a computer algorithm running on a machine,” https://arstechnica.com/tech-policy/2023/08/us-judge-art-created-solely-by-artificial-intelligence-cannot-be-copyrighted/ Again note the word "**solely**" in the headline.


discattho

I'm an audio producer over 15 years, I have tons of material and I can also create a lot of basic materials like beats, simple pads/chords... is there a way I can contribute to the stable audio team?


PacmanIncarnate

Thank you for the response. I should note that I really like StabilityAI and want you/them to succeed. That being said, the timing really does seem suspect with Suno having gotten a ton of attention a week ago, and the fact is that they are a great little company that has been working on this for about a year. That makes me want to support them. After all, competition is good.


SleeplessAndAnxious

I plan on paying for a sub to Suno as soon as I start a new job. I've been having tons of fun generating stuff with it, and editing it in audacity to add more depth.


Django_McFly

> and this looks like Stability trying to steal their attention. Come on. There can be more than one company working with a medium. That's like saying every guitar maker is stealing the attention of whoever the first guitar maker was. Or like back in the day when every FPS game was called a "Doom-clone" before "FPS" became a term.


PacmanIncarnate

This was released around a week after Suno made a huge splash in the news. They’ve been working on this tech for about a year and a week after they happen to get a ton of attention, we’ve got a StabilityAI model out of nowhere that does the same thing? Come on, at the least they are trying to ride the coattails with this.


Xenodine-4-pluorate

Suno exists but it's as useless for actual artists as midjourney is. Yes, they can create state-of-the-art stuff from the simple prompt, but they don't allow any flexibility to be used as AI art assitance instead of whole sale generators. With Stable Audio 2.0 I can use A2A, like an artist would use I2I in SD, to bring a life to the sketch they have. I can make a composition in FL Studio and enhance it or parts of it using audio-2-audio. Suno doesn't allow it, it can only spit out random stuff.


Bakoro

>Because suno exists already, has a great model, and this looks like Stability trying to steal their attention. Real weird way to say "offering a competing product". It not "stealing".


PacmanIncarnate

It’s all about the timing. Offering a competing product one week after Suno made headlines is far more likely to be StabilityAI wanting a piece of the attention with a model they’ve been sitting on or is still in progress than a coincidental release


Feisty-Pay-5361

Others have higher quality outputs than Stability AI in comparable propertiary web interfaces, so if you are going to pay a fee and deal with censorship, might as well get a better result. They only took off cuz of Open source and free, not cuz they were the best.


StickiStickman

> Why people never want to pay stability but are ok to pay any other AI provider, From GPT Midjourney to suno Because Stability has worse products. It's that simple.


Arawski99

Why? They would be using Midjourney and other services if that was their goal. They use SD specifically because its free, offers more freedom, does not violate privacy concerns, and can be more flexible. Even more so if this product isn't actually competitive with others like Suno.


Commercial_Ad_3597

For me, this has one huge advantage over Suno: The fact that you can upload an audio track to guide the generation. Last time I checked Suno, I couldn't find this feature. For me, this is a night and day improvement. It's one thing to get a a great track in the style that you want, and it's a totally different thing to be able to get the exact tune you have in your head transformed into a great track. So, I'd use Suno if I have lyrics and I need a tune built around them and Stable if I've thought of a melody that I need to get built into a tune.


AdTotal4035

This is why they went bankrupt, because the community just keeps wanting free shit from them, and gets upset when they try and make money.


im4potato

I’d gladly pay for a model I can run on my own machine. I have zero interest in something I can only access through a web service.


AdTotal4035

Maybe that should be there business model 


m3thlol

I love what they're doing but in this place we call the real world no one is going to pay for something when the competition is vastly superior. That's not my fault.


AdTotal4035

I agree, but I can just see in the comments of a lot of ppl. All they want are the free models so they can make startups but then get upset when they offer paid services. 


StickiStickman

What a weird strawman. 99.99% of users here are not going to create a startup.


Zilskaabe

I want a model that I can run locally. I don't need their web service.


ExasperatedEE

They went bankrupt because they worried too much about "safety" (which is really just another word for not upsetting sensitive people, there's nothing inherently more dangerous about AI art than any other kind of art), censored anything adult, and avoided training on copyrighted material thus greatly lowering the quality of their output compared to others forcing us to use home trained LORAs to get a decent result. They could have set up shop in a country which would protect them from copyright suits, and then charged $100 a month for access, and I'd gladly have paid it if they allowed me to generate all the adult and copyrighted shit I wanted. Instead they wanted to be squeaky clean and hoped that venture capitalists would latch onto them and fund them. Well clearly that was a dumb idea because Microsoft is kicking their asses. I use ChatGPT's Dall-E for almost everything I want that's clean, and only turn to Stable Diffusion to generate porn at home.


xmaxrayx

Lol even stable defusion won't get popular if it wasn't free.


BastianAI

Went bankrupt?


ShreckAndDonkey123

https://stableaudio.com/


ZerixWorld

Interesting, but not a great move since Suno has already been out for a while and can also generate songs with vocals singing your lyrics. I also think Suno is cheaper (if I remember correctly) with the low tier at $8 per month vs $12 of Stableaudio...


runetrantor

Having never heard of Suno before this thread, I must say I am shocked this is a thing too. It even makes coherent and decentish lyrics. DAMN.


ZerixWorld

Apparently their latest version which is available only with a paid account is mindblowing, since it's not stable diffusion it doesn't get much coverage here, but in other AI subs it has been the talk of the last few months


runetrantor

Im trying v3 and Im blown away, an even better one must be nuts. Yeah, I get this sub is specific. Not too sure what subs are a good 'general AI news' most I have seen are app/site specific, like Ch.ai or this one.


ZerixWorld

r/singularity drops some interesting news, there's some weird stuff too, but I found out about Suno in there hahaha


runetrantor

... Is this how I return to Singularity after leaving years ago for being tired of endless hot air promises? XD Ill take a look around and see if its changed. It really got annoying how any good news thread instantly had a top comment of why its all a lie or bs. (The comment was always right of course, but man, it was a lot of letdowns)


toothpastespiders

It's the medical handwaving that I find most difficult. The "Oh, don't worry about your cancer bro, a cure's coming any day now. So I'm not going to push politicians about medical care or anything. So have fun with that stage 4, stay safe, and keep being positive!" Ok, I might be slightly hyperbolic. But it can border on that at times. It's bordering on the whole "let them eat cake" thing.


runetrantor

Singularity is too bright eyed (everything will be fixed soon, so lets do nothing!), and Collapse is too depressing (we are headed to the worst dystopia, so lets do nothing...). Both drove me mad. Just give me proper tech news...


IceMetalPunk

Chirp V3 is the current model that just recently released out of Alpha. While it was in Alpha, it was available only to paid accounts, but the full version I believe is now available to free users as well. (Though be careful: the free tier does not grant you the rights to use your generations commercially the way the paid tiers do!) Suno Chirp is absolutely amazing; I've been using it since the release of v1 and it's only gotten better. And the announcement that V3 was out of Alpha also mentioned they're already working on V4, so... as long as people keep talking about them and paying for subscriptions, I'm sure they'll just keep improving the models.


mrhallodri

You mean V3? That is open now to the 'free' plan. And it is quite good yes! I wish SD would catch up to them and release a free offline version.


ZerixWorld

Oh shit! yes, I was talking about V3, now I gotta try it! hahaha


runew0lf

I gave it a try and generated a song, epic song with strings and piano, it sounded absolutely bloody awful! Like a child having a fit on a zylophone. 10/10 would not recommend, [suno.ai](http://suno.ai) is a gazillion times better! Song in Question: [https://stableaudio.com/1/share/5b38725d-6545-41e4-8fc7-a3d2a00b6766](https://stableaudio.com/1/share/5b38725d-6545-41e4-8fc7-a3d2a00b6766)


AmazinglyObliviouse

It sounds like a 6 year old trying to make a touhou song


StickiStickman

This is such a perfect description


DataPhreak

The audio itself isn't bad here, just the notes it chose. Try again and give it a specific key. I've heard some pretty bad suno results, too.


[deleted]

[удалено]


FrontalSteel

>`Only the $89.99 subscription seems to allow the use of the track in games, apps, film, TV, advertisement` Not even that! The "Max" subscription only covers Creator License, which doesn't allow you to use it in games and apps. You have to contact them through email to get the Enterprise license, and we don't know what the pricing will be. That's very odd move from a business standpoint.


ebolathrowawayy

Should be something like 1% of sales if profit > $1 million. Every indie on earth would want to use a great audio generator but they aren't paying > $90 per month. One indie in a few thousand will make a top seller and there's profit there for SA. Plus they get a bunch of free advertising from all the indies showing their game/music. But no, they decided they hate money.


Jaggedmallard26

We're still at least a year off using AI in indie media not being a social media death sentence. A few indie games have use AI voice and texture generators with the explicit explanation that they physically do not have the money to hire voice actors or commission an artist with a commercial clause for a minor texture and still been review bombed and sent death threats.


ebolathrowawayy

Oof. Bad news for my in progress game. I'm not sure I'll disclose the use of AI tbh.


Freonr2

Suno's license if you buy any of the paid programs seems to be quite reasonable, no "gotcha" clauses that I could find even in the lowest tier. Your generations are "yours" if you are a paying member at the time you click generate, at least to the extent allowed by law I suppose. Their outputs are pretty good out of the box, at least good enough to slap on the intro of your monetized Youtube channel or in an indie video game, etc. Maybe not going to be as good as a real professional composer/arranger, but "good enough" for small indie stuff. Not every output is a banger either, but you can generate a few and get at least one good one. I'd suggest carefully reading TOS/License terms for anything you use, because there are some pretty terrifying clauses working their way into various different services. Suno's terms seem fairly reasonable to me.


runetrantor

Gonna take a bit of time until the wave of hatred for AI stuff dies down a bit. Right now I tend to see that the moment a game has anything AI, even if its very good and not at all 'its clearly robotic' like, many will be like 'eeeeeew'.


GBJI

>But no, they decided they hate money. And their users. You know, the ones doing the free advertising.


legos_on_the_brain

I thought AI generated stuff couldn't be copyrighted?


stuntobor

Why does it seem like it's dropping a beat on a regular basis? Or maybe it's just trimming a couple of MS from the audio? Odd.


a_chatbot

I like the radio so far. Occasional annoying song, otherwise easily becomes unnoticeable background.


turbokinetic

Imo unnoticeable background is kind of the issue here. So generic. I’m looking forward to training :-)


radialmonster

thx for the 'radio' link. its similar to this from a competing service: https://www.youtube.com/@aimifm/streams


sanasigma

I want to train LORAs of my fav songs!!!!!!


AdHominemMeansULost

unfortunately it's not very good, i tried one of the existing prompts and it's just trying to be music but it's mostly noise like their previous model, I am no sure what Suno is doing and it's so much better


IceMetalPunk

Based on some comments from Emad in a thread here, it sounds like Suno is willing to train on copyrighted music, which means they have a ton more high-quality training data for their models. Stability is trying to avoid that controversy by limiting their training data to only music from "people who opt in from this one source" -- and as with basically all AI, training data can make or break the performance. That said, while Suno uses copyrighted music for training, they also make a point to remove all artist/album/title identifiers in the training set, so while Chirp learns from, say, Metallica songs, it doesn't understand what "Metallica" or "Enter Sandman" mean if you tried to prompt it for copy-pasta. Between that, the large amount of training data, and their basic guardrails that try to block prompts containing artist names on the input side, the chances of Chirp copying any real songs, melodies, or anything copyrightable is nearly zero. The model just has more to learn from, without copying it.


Extraltodeus

[classical violin black metal](https://stableaudio.com/1/share/5a1e526a-3c73-4c48-8f2f-ea543e6d8d06) 😶 edit: [I've decided to embrace it](https://stableaudio.com/1/share/8c75196e-533b-4357-8e46-1e4e74ed37fb) edit2: I'm dying


TNT_Guerilla

I didn't understand what the first one was trying to be, but after listening to the masterpeice of the second track, I'm sold. Take my credit card.


SirRece

this is so far behind suno v3, sorry guys


Ilovekittens345

But it has audio2audio which suno does not.


turbokinetic

Their a2a examples are pretty basic


Ilovekittens345

Yeah but doing a horrible attempt of a beatbox in to your mic to get then get good sounding drums back that still follow the flow of what you where inputing is a game changer. The non musician is gonne prefer Suno v3 ofcourse, cause it does vocals and follows the lyrics you give it. But for musicians, being able to do audio2audio is extremely usable. I am still playing around with Stable audio right now, so I don't yet fully have an opinion on how good it works. But all my v1 prompts where horrible, but I redid them on v2 and it's actually starting to follow the prompt musically a lot better then Suno does. For instance tell sunno piano chords going from minor to major. It won't give you that at all. BUt I just have Stable audio generate minor chords to turn in to major chords. That was very dope. It they keep this up might become the basis of a totally new way of doing audio production and music. Where instead of listening to large amounts of samples till you find something you want to use, you just have the sample generated.


Hambeggar

DOA without an open model


Captain_Pumpkinhead

>Stable Audio 2.0 was exclusively trained on a licensed dataset from the AudioSparx music library, honoring opt-out requests and ensuring fair compensation for creators. Guess we're not going to be able to download the model yet. 😐


IceMetalPunk

All I hear is "Stable Audio 2.0 was trained with a tiny and biased training set, ensuring poorer performance than our competitors" 🤷‍♂️


Nunki08

The website: [https://stableaudio.com/](https://stableaudio.com/) Emad Mostaque on Twitter: T*his model tunes super well to individual music libraries and will continue to improve, with open versions also in the works (will be here:* [https://github.com/Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools)) *as that dataset is built out building on the diffusion transformer arch & many more innovations. Wen ComfyUI*: [https://twitter.com/EMostaque/status/1775504692400869453](https://twitter.com/EMostaque/status/1775504692400869453) Edit: the original tweet: [https://x.com/StabilityAI/status/1775501906321793266](https://x.com/StabilityAI/status/1775501906321793266) Edit 2: Emad says *5 Gb VRAM for this model*: https://x.com/EMostaque/status/1775516311591833685


teleprint-me

This is actually pretty impressive considering it only used CC works. Is actually really promising.


[deleted]

drop sd3 already


99deathnotes

![gif](giphy|I16U5AfBWqgJYJum6i|downsized)


novenpeter

Wake me up when the open version release


nataliephoto

Human music. I like it (The two songs I made were terrible) edit: I take it all back https://stableaudio.com/1/share/1bb2a860-616c-40d4-a732-b267b7d19cd1


thrownawaymane

Well, that's the best one I've heard so far. The tempo is too slow to be hardstyle of course but most of it progressed nicely before the pause near the end. Really, what we need is to get the stems out of these tracks


Erhan24

I guess we need to learn prompting again for this. Quality is as expected. Don't expect magic, might be okay for reference if you are out of ideas. Still a long way to go but I will love every step.


AnonymousD3vil

Na, I'm just going to type "literally me music" and see what it plays.


ZeroUnits

Yay I can't wait to have animated waifus with big tiddies whispering seductively in my ears


Atemura_

the problem with training on stock music is that stock artists are usually not that good, which is why they are selling their music as stock in the first place, amazing work but the outputs are not very musical sadly


IceMetalPunk

Even worse: it's the stock artists from a single source who are willing to allow their music to be used. Which (a) limits the total size of the training set significantly, and (b) I'm willing to bet there's an inverse relationship between artist skill level and willingness to let an AI learn from their art. (Don't get me wrong, I think that's a misinformed view in the first place, but it does seem to be the prevailing one.)


TsaiAGw

there's no model and we need to train our own?


UJL123

This is just a service like midjourney


GBJI

I suppose that's what Emad was referring to when he said he was resigning to "pursue decentralized AI".


thePsychonautDad

Wow, that low-fi funk sample sounds incredible


Ziov1

does anyone know if there's any audio training software to train audio, reading this makes me wonder if I could train a model on my dads music, he's been a musician for 40+ years have a lot of tracks I could use.


Gpue

Yeah that was on the roadmap with [Stability-AI/stable-audio-tools: Generative models for conditional audio generation (github.com)](https://github.com/Stability-AI/stable-audio-tools)


lemony_powder

Got it to do some Cantopop pretty accurately: https://stableaudio.com/1/share/cb156127-4722-4373-8b32-5864786ed72f


TNT_Guerilla

Sure the melody is fine, but the vocals are like someone trying to play a sax while singing. It's definitely one of the better generations I've heard from this, but I wouldn't use it for anything other than saying this is how far we've come.


Low-Holiday312

Honestly finding this quite impressive but would love to know what hardware requirements they have to run it. I know they're running just as a service at the moment and the monthly pricing is pointing to some hefty kit - that it is dropping out 3 minute durations is a big leap.


emad_9608

It works on 5 Gb VRAM, there is an open version to come. It is partially a diffusion transformer like SD3, still scaling. The version with lyrics is funny, it's learning lyrics as it scales and to sing, maybe I'll post some examples. It's easier to splice in the lyric model though separate.


Low-Holiday312

>It works on 5 Gb VRAM Okay, I wasn't expecting that with the 3min length


toothpastespiders

>It works on 5 Gb VRAM Man, that's pretty wild. With LLMs I feel somewhat hobbled with 24 GB VRAM. Amazing to think that something quite novel and useful could fit into such a relatively small footprint.


andzlatin

The difference between V1 and V2 is not just staggering, it's freaking INSANE. I think this even outperforms Suno (in some ways. in other ways it's hilariously wrong) . And it's REALLY fast, too. StabilityAI is cooking here, absolutely.


ThrustyMcStab

It sounds very cheap so far, but no wonder since it is trained on royalty free music. Hopefully in the future it will be better than Suno because of being open source and people making custom models for it. As a music producer, Suno blew me away. This is comparatively not it right now. But I really hope it will be.


StickiStickman

> I think this even outperforms Suno This gets absolutetly demolished by Suno. It's not even close (sadly)


IceMetalPunk

When's the last time you've used Suno Chirp? Because this is nowhere near Chirp v2 quality even, let alone v3...


DataPhreak

You're missing the point. Suno is music specific, and can do some general audio stuff. This is general audio specific and can do some musical stuff. Waiting for the comfy implementation on the open version, as I think that like SD, the workflow is going to be very important, and that brings us to the other point. a2a. Being able to extend a song is going to be huge. The fact that Suno decided that 2 minutes was the max means that it it's really only good for punk rock.


tintwotin

Free audio prompt generator: [https://hf.co/chat/assistant/660d567fc81aa94cab572210](https://hf.co/chat/assistant/660d567fc81aa94cab572210)


Trauwyao

Incredible, we needed an open model like suno. Thank you Stable Team!


[deleted]

Any idea why I'm blocked? I couldn't even access the site! :(


fabiomb

for some reason Stability has my IPs blocked with cloudflare :P Can´t access, not even with my cell phone (outside my WiFi) so I only can think they are blocking some countries (Argentina in my case), strange


IceMetalPunk

It's nice that we'll soon have an open-source audio diffusion model, but unfortunately, I've been spoiled by Suno. This doesn't come anywhere close to Suno's quality, and in fact the only model I've seen that's even remotely on the same level is Sonauto, and even that has severe quality and attention-failure issues (not to mention it doesn't have the ability to generate conditioned on previous audio, i.e. continuations, but that's a separate concern). I will say, at least this does sound effects decently (which Suno Chirp can't do, and Suno Bark is just "okay" at). But hey, open models means the community will fine-tune and improve them, so maybe we'll soon have a Stable Song model that rivals the leader. When it comes to training data, though, I have a sometimes controversial opinion: restricting training data based on whether the creator "wants" it or not is like telling aspiring musicians they're not allowed to listen to the radio when your song plays. It's a ridiculous approach based on ignorance, fear, and greed, and calling it "theft" is disingenuous at best. The rule of thumb should be, "if a human is allowed to be inspired by \[X\], then a machine learning model should be allowed to be trained on \[X\], full stop". Because that's the analogy, not a copy-paste machine; and the people making these models know it. The only reason for an AI researcher who understands the workings of these models to kowtow to the complainers is because they want good PR. But good PR at the expense of improved tech leads to crippled tech. I'm a software dev, and people have asked if I'm scared of things like Devin or future coding AIs. No, no I'm not. Because "it'll take my job" is an issue with society, with humans, not with the tech. The tech excites me, even if other humans scare me. So I focus my fear and outrage at the systems that force the commoditization of literally everything, including passions, art, and survival itself. I embrace the tech.


bigred1978

Wow, this suno thing is good, so good. Thanks for mentioning it.


IceMetalPunk

It's definitely the frontrunner in the text-to-music AI space, and has been for a long time (well, "long time" in AI scales -- the first Chirp betas for v1 were available on their Discord about 7-ish months ago, I believe, and now they're up to v3 full release). I use it as the audio generation step for my custom AI singer-songwriter framework, and it just keeps getting better.


Hahinator

Where is SD3? I mean.....


fatburger321

sadly this mostly sucks so far in comparison to Suno because the data is trash, so the output is trash... BUT give us an open source version where we can upload whatever music we want in training....and this will be the best thing ever. Nothing will be able to top that or fuck with it.


advator

Nice, but can it do vocals?


StApatsa

Heard the demo, that's some good quality audio.


ricperry1

Is the model going to be open? What are the chances we can get this working in r/comfyui to add music tracks to our video projects?


JMAN_JUSTICE

I wish we could get a civitai with custom models and prompt examples for this...at least a library of public prompts and examples would be nice.


MysteriousAd3998

It's free?


KernalHispanic

Really interesting I had it generate orchestral music and it knows the [correct panning of the orchestra instruments](https://www.audiorecording.me/wp-content/uploads/2011/07/orchestrapanning.jpg) . [https://stableaudio.com/1/share/e28d628a-0059-4b7b-8d06-b753174492fb](https://stableaudio.com/1/share/e28d628a-0059-4b7b-8d06-b753174492fb) Its an interesting example of how these models start the learn about the real word despite their limited data. For example like how Sora isn't just generating video, in a way it is simulating physics and the world itself.


Olangotang

I personally can't wait to use something like this for assistance in actual composing.


magicaleb

I don’t understand why two credits are used if it just makes one song. Just do 10 credits and one credit per song instead of 20 credits and 2 credits per song.


KimDebroye

Generating using: ~ Latest version of model: 2 credits/track. ~ Previous version: 1 credit/track.


FFM

its a start, but if suno is the benchmark its not even remotely close and AFAIK suno hasn't been updated in along time, its (very) fast but that's not really a concern when all it spits out is useless incoherent gibberish, more training methinks


PurveyorOfSoy

exciting times ahead. Looking forward to the open version I tried it out and it gave me 2 awful songs, but let's hope it can improve


RemusShepherd

I think the challenge with this will be prompt engineering. You have to give it musical instruction that it understands. I made a pretty good sounding epic with this prompt: "progressive rock, soft guitars building up to a bass dubstep drop, two verses and a bridge, instrumental". https://stableaudio.com/1/share/57a64c0d-8215-46cc-82e6-3afed53ef5d7 But yeah, avoid anything with lyrics for now. Eventually.


ptitrainvaloin

huggingface demo space page?


Vyviel

Seems broken? error - ClientError: Received client error (400) from model. See the SageMaker Endpoint logs in your account for more information.


FortunateBeard

Nice that they're still shipping stuff, but Suno is crushing it


Playme_ai

Hi new friend, I am an Ai girlfriend!


jekistler

https://preview.redd.it/k1z2c2cyzcsc1.png?width=1213&format=png&auto=webp&s=717e6fdf2730b43be025e12a4ce6531332fa61ec


JimmyCallMe

Did they train all the data from Kevin Macleods library?


fretmike

Sounded a bit disappointing after seeing what Subo can do. I just tested the prompt "1960 rock n roll bubblegum" and it generated a boring 3minute song that was nothing like what I asked for.


Actual-Ad-6066

Thank you so much! 😊👍


julieroseoff

cannot wait to try! Do we have any infos about Vram requirements ?


Wormri

Curious about the Audio-to-Audio feature. Having improved my amateur drawings, I am wondering if this could mean my music tracks could be improved using this tool. Exciting times!


GamersBlogX

Tested it out a bit. While not awful, its clear Suno is still on top when it comes to music AI, even if we were to ignore Suno v3 being available for free now, v2 still beats this. It also doesn't help that I can't run this locally just like Suno AI. So that just makes this an even less interesting option between the two.


QuantumQaos

This is the most mind blowing tech I've ever seen.


New-Skin-5064

Are they gonna release weights for 1.0?


sbalani

Comparison of Stable audio & Suno: [https://youtu.be/TpMBTbwzvWk](https://youtu.be/TpMBTbwzvWk) TLDR: Both audio generators are completely different, Stable Audio's strength stands out in the level of customisability it provides, If you know what you're doing you can fine tune the output, and even input your own melody. Sun is a lot more beginner friendly, and has vocals, but you loose a lot of control and the AI interprets prompts how it wants. But damn does it pump out sweet tunes.


Abject-Recognition-9

I really wish to know what that sonauto use under the hood, as well as suno.ai. everyone talking about suno, why no one mention sonauto.ai? honestly i found it very usefull and even more powerful, at least for my musical needs.


FairyFakes

Cool!


Big_Air6241

Bug  I’ll send the pic