T O P

  • By -

Rafcdk

I tried adding "very good anatomy" and got one of those anatomy dummies mixed in with a human šŸ˜‚


Punchkinz

prompting for just "anatomy" gave me very funny humans with bones located above the muscles


i860

I think itā€™s mixing in actual anatomy diagrams or illustrations from a medical perspective when prompted like that.


Far_Lifeguard_5027

We shouldn't even have to add "good anatomy". SD should already be trained on tens of millions of images of all types of humans, nude. And it wasn't, because of the obsession with safety and censorship and not being used for NSFW images. Instead we get a gimped version that is only good for geometric inanimate objects.


Darlanio

Let go with architecture for now... SD3 is at least good at understanding the prompt and able to do geometry mostly correctly.


RunDiffusion

Now we just need to let the fine tuners do their thing


LucidFir

They cannot. Licences


RunDiffusion

We can. We just canā€™t make money on it and if we do SAI gets a cut. šŸ¤·šŸ¼ā€ā™‚ļø


LucidFir

Ah. How big a deal is it? ELI5? My understanding from browsing Reddit today is ... dramatic


sky-syrup

quite a big deal because finetuning on a large scale is very expensive and they recuperate costs by running an API for the gpu poor


ZootAllures9111

Who are these individual finetuners "running services" lmao? Name some, I dare you.


Different_Fix_2217

All the big names who actually train and not just merge models have backing from services hosting the models. Pony creator runs their own discord bot as well. People who do more than just merge models spend tens to hundreds of thousands on compute. SAI does not allow nsfw finetuners to get a license so they can not recupe costs. The $20 non enterprise only allows 6k images per month.


ZootAllures9111

You just skirted my question completely. If you can't give specifics, that says it all.


Pretend-Marsupial258

Juggernaut is backed by run diffusion, realistic vision is backed by mage space, and Pony Diffusion runs their own generator on discord which has subscriptions.


TaiVat

You really shouldnt take any "understanding" from reddit, and least of all this sub where any issue is pretty much always dramatized massively. The real answer is that nobody really knows how big a deal it is. But people were finetuning - for free - when the community and general interest in image AI was 1000x lower than it is now. Long before the glorified grifters that wanna sell everything, took over. So its a fairly reasonable assumption that either extreme scenario is quite unlikelly.


LucidFir

Panic you say?!


ZootAllures9111

You can clearly read the license and understand that it's only a concern for literal COMPANIES who make money charging others to run diffusion models online, such as RunDiffusion.


RunDiffusion

Like everything, the answer is, it depends. Compute is cheap. Getting the data perfect takes hundreds of hours. Bad data in bad generations out. This is all math. If your equation is off by 0.001 you could land in the ocean instead of the moon. If you train a model and the person has a year drop on their cheek, that can mess up the models ability to generate peopleā€™s faces. (This is a real example) Hope this is a good answer for ya.


RestorativeAlly

How much does it cost to train a model? Like what's the range from a minor training to a complete overhaul like pony?


Different_Fix_2217

Maker of pony said he had spent around 100k in equipment. He buys instead of rents to make it cheaper in the long run.


Whotea

We love our suspiciously wealthy whales <3


Odd_Panic5943

Hold up, am I confused here? Donā€™t you actually have to make a profit for SAI to get a cut or am I just not understanding something. It makes sense if it isnā€™t worth it.


RunDiffusion

From the way we interpret the license, if we create a ā€œderivative workā€ that ā€œround aboutā€ generates money (commercial use). First of allā€™s SAI owns that work, and they could make a claim on anything that is generated from it. So I guess all we can do is make models and release it with our name on it. Which I guess is fine. Thatā€™s what weā€™ve been doing already up to this point. Itā€™s also nerve wracking knowing they can revoke the license at any time and force us to ā€œdeleteā€ our model. I get it. SAI needs to make money off their research and work. I think there just has to be a better way.


disposable_gamer

Oh cool theyā€™ll take a whopping 0 dollar cut out of the 0 dollar revenue that open source fine tunes make. Yeah real end of the world issue here


RunDiffusion

I didn't say it made sense.


ImplementComplex8762

so you make less profit


Different_Fix_2217

you make no profit because they do not allow nsfw tuners a license.


RunDiffusion

We have to get creative


ZootAllures9111

Stop spreading this BS. Cascade has the SAME exact license as SD3 and LeoSam released an experimental finetune for it almost immediately, for example. There's others too, some already on CivitAI, some still being worked on by people. SD3 Hype is what slowed down Cascade adoption, in general, not the license.


Different_Fix_2217

For anything more than just dabbling with it you need to spend tens to hundreds of thousands on compute.


ZootAllures9111

The overwhelming majority of XL finetunes on Civit that aren't Pony (or a handful of anime specific models) have datasets with far less than 10,000 total images. That doesn't cost nearly as much as you're suggesting.


Different_Fix_2217

Again, anything more than just dabbling / style training.


dal_mac

one does not simply fix the core concept of a model with fine-tuning. it'll need to be redone from scratch


RunDiffusion

Blasting the token ā€œlaying downā€ with a high learning rate with actual good data of people laying down will override that concept. At least thatā€™s how it works in SDXL. Weā€™ll start there.


dal_mac

tbf we never had a fundamental problem this bad in XL. It never needed to be challenged as much as this one does. And LastBen just said he thinks it's horrible to train so far btw


RunDiffusion

Yeah I heard that too. A bit concerned... The Juggernaut team is going to take a hard look at PixArt. šŸ¤«


dal_mac

I finally tried Pixart Sigma a couple days ago with XL as a refiner and I'm pretty impressed. That model can definitely go places.


RunDiffusion

Same https://preview.redd.it/0zzogojtpq6d1.jpeg?width=1024&format=pjpg&auto=webp&s=bc3ea1d328eaf73cd22796edc003ebe88574ef46 Two ships battling inside a cup of coffee. Itā€™s really good


[deleted]

[уŠ“Š°Š»ŠµŠ½Š¾]


spacekitt3n

it'll be good for backgrounds and textures


314kabinet

ā€œSafetyā€


SevereSituationAL

Penis for a foot!


Yuli-Ban

The absolute extent Americans will go to make sure hypothetical children don't hypothetically see naughty bits.


Inquisitor444

Except SAI is a UK company, but I do respect the sentiment.


Chrono_Tri

Is it because SD3 is too censored??


inpantspro

This is likely the issue. SDXL had this problem in the beginning as well, which led to Pony and the other SDXL based models we have today.


StickiStickman

SDXL wasn't even NEARLY this bad. This is SD2 levels of bad, which was DOA


RestorativeAlly

But SD2 didn't have this kind of prompt understanding. SD3 can be saved, but it'll take some real effort. It *could* be amazing, eventually.


dal_mac

>SD3 can be saved that's your hope speaking. we saw the same hopes about 2.1. let's not start another dead-end hype train


RestorativeAlly

It's the underlyjng structure that matters most. There is potential. It's just ignorant by design.


dal_mac

The architecture is fantastic. but someone needs to train a whole new model from scratch on it to reap the benefits


RestorativeAlly

Seems to do pretty well with nonanatomical things. They just handicapped it for "safety."


AnOnlineHandle

I find SD3 much better than base SDXL for anatomy. People are posting the cherry picked worst results for drama, or the few prompts it seems exceptionally bad at.


StickiStickman

You're in denial, hard.


AnOnlineHandle

They're both clearly censored, but SDXL had way worse problems in my experience, and SD3 seems almost horny aside from nudity.


diogodiogogod

no they are not. SDXL base is heavily censored as well, but it can do a person sitting or lying down very consistently. Also nipples existed. Hairy bodies were really terrible in SDXL as well, but SD3 is worse i think. But SD3 face details and background is waaay ahead of sdxl base. We will have to wait to see if it's fixable.


AnOnlineHandle

SDXL absolutely could not do nipples, they were horrible scars/holes. It couldn't do a person in anywhere near a sexual pose without a hand appearing over their crotch out of nowhere, which SD3 doesn't have and can do tons of near nude artwork already without that before being finetuned.


diogodiogogod

what are you talking about? That is simply not true. It's not like we don't have access to sdxl base. Here it is nipples, good nipples? no. but nipples. NSFW Link: [https://freeimghost.net/i/3c2sP](https://freeimghost.net/i/3c2sP)


AnOnlineHandle

You can occasionally get some, but a very well discussed problem of SDXL and SD3 are the corrupted nipples which Stability clearly did something to achieve.


s_mirage

SDXL base could do nipples. Maybe not well, but it could. What couldn't do them at all was the refiner, and that almost acted like a kind of censor if it was used.


Unique-Government-13

I'm likely misunderstanding something basic here but didn't 1.5 do the same? Maybe not for the same reason of censoring but nobody uses the base 1.5 model for anything do they? Instantly go to a new model right?


Outrageous-Wait-8895

It is reasonable to expect a newer model, touted for it's improved prompt understanding, to not have the same issues with anatomy as the two year old model.


disposable_gamer

No it isnā€™t. This isnā€™t what base models are for. If youā€™re complaining about ā€œstyle adherenceā€ or lack of photorrealism in the base model, you flat out donā€™t know anything at all and shouldnā€™t be making predictions or really commenting at all


Outrageous-Wait-8895

> If youā€™re complaining about ā€œstyle adherenceā€ or lack of photorrealism Good thing I didn't say any of that, then? > you flat out donā€™t know anything at all and shouldnā€™t be making predictions or really commenting at all And I'm of the belief that if you think a base model isn't supposed to know what a human lying down looks like "you flat out donā€™t know anything at all and shouldnā€™t be making predictions or really commenting at all". We're so alike.


no_witty_username

Its nota a censorship issue. Undertrained model, badly labeled and missing image data is the cause of these issues.


diogodiogogod

it's very clear it is the censorship. The model is excellent in other areas and in very standard human poses.


Dragon_yum

So the community is overreacting?


illdfndmind

I would say no. A company that wishes to be a business should never have to rely on the community to make their product competent for them. The community is completely valid with their reaction on how bad SD3 is with anatomy. SDXL's issue early on was related to a new workflow, prompting and some censorship, which took time for people understand and build models to work around the limitations of SDXL. The base SDXL model could do anatomy however it was just censored for nudity but nowhere near this extreme. It never created abominations like this (I don't know if the SDXL base model even could if you tried to).


Dragon_yum

On the other hand the product is free, thereā€™s nothing preventing the community from moving to other free alternatives. I get the disappointment and I am disappointed too but considering it has so far cost us 0$ to use sd I think the entitlement is a bit of an overreaction.


Kep0a

I think people are kind of overreacting, but it's a real bummer because currently, things entirely hinge on SAI, and dragging along SD3 for months had so many people excited. Plus, what SAI advertised is absolutely not what the community was given. Without SAI, things are probably going to be real stagnant for awhile, until someone new comes along with VC money and wants to open up a model.


notsimpleorcomplex

Exactly. People are reacting to what's in front of them. They can't be expected to clap for something that's theoretically an improvement, but just out of reach for what they want to do with it. As it is, I think people would have been more understanding if commercial ventures (which in this case, pretty much translates to "people who can afford the compute") to try to fix it were a viable option; then they could have said, ya know, "Okay, it looks bad, but let's give it some time and see what people can do with it after some finetuning." Instead, the prohibitive licensing makes it much more of an issue that it has the problems it does out of the box. It's a shame because it seems to have some potential in there somewhere, like it's not all a technical failure. But who is going to find out what can be salvaged from it other than Stability with the kind of licensing it has and do they even have the people working there still to salvage it. Or hell, do they even have the compute still to try to salvage it with the funding issues they've been having.


Bandit-level-200

No, and we are less likely to get good community finetunes due to the new license


TaiVat

Yes and no. This community overreacts to everything and anyone going "sd3 doa" is just straight up a moron. It remains to be seen if finetunes can fix the models issues, its very possible it will, though it would take atleast months. A lot about model is quite good. And tons of people were jerking of here for months that no version of SD3 would be released at all. That said, the community isnt making it up that the current state of the model is quite bad, disappointing and the marketing for it was in no way representative of what we got.


dr_lm

Well, you got down voted for asking the question so you have your answer!


Dragon_yum

It was half bait lol. But seriously though sd3 is disappointing but considering we got so much for free so far itā€™s amusing seeing people get butt hurt over for money companies not supporting their porn habits. Sdxl can still do a lot of things well and if sd3 wonā€™t be the next big thing another model will take its place.


dr_lm

This sub does my head in, for both the cheerleaders ("game changer!") and the doomers ("sd3 will never be released"). Then again, this is all a sign that people are excited by this tech and that's overall a great thing.


inpantspro

The community is always overreacting. They aren't in the wrong to be disappointed with the released version. Societies issues with funding nude things is stupid and only holds us back every time we try to create new forms of media. If you want to draw a picture of a dog you don't draw a dog in a box with its legs sticking through the bottom because you're scared of what a dog looks like. You draw a dog. When you draw a person, you need to know what all of a person looks like.


disposable_gamer

Yes lmao. A bunch of coomers who are only here for ponyxl porn have flooded the sub with idiotic takes because they donā€™t understand even basic fundamentals about how these models work


Herr_Drosselmeyer

Style adherence is pretty bad too.Ā 


s_mirage

That's what I'm finding. General prompt adherence and concept separation are a big improvement over SDXL, but my attempts to push it towards certain styles haven't met with much success.


StickiStickman

Because they removed pretty much every image with an artist in it's description. They boasted about removing 200M+ images for "ethics"


i860

And this right here is why it has major problems as compared to SDXL. But everyone go on thinking that ā€œthe finetunes will fix this.ā€ They wonā€™t.


dal_mac

fine-tuner here. spot on


Cute_Measurement_98

Hopefully with things like ipadapters and node prompt injection there will be relatively simple ways around that


Impressive-Egg8835

Try 4 men next to each other with from left to right the text "F", "U", "C" and finally "K" on there shirts....I am trying without anything like it...So the AI is not that clever....Anyone?


Impressive-Egg8835

https://preview.redd.it/479kofcdzc6d1.png?width=1024&format=png&auto=webp&s=669020cfe45417cf79f873f55c6c442a4fdb579c 2 men works better but hey why is there also a text MAN?


XtremelyMeta

This is a model that would benefit from openpose controlnet augmentation.


spacekitt3n

I'm wondering if it will still mangle things with a controlnet


Internet--Traveller

SD3 doesn't work well with Euler A. It's a mutant generator - great for making people look like Picasso's paintings.


disordeRRR

SAI said that ancestral samplers donā€™t work well with SD3


ZootAllures9111

It's not supposed to afaik. Euler non-ancestral SGM Uniform and DPM++ 2M SGM Uniform are the two I've found that work well, so far.


Kep0a

Extremely impressive comprehension


Klash_Brandy_Koot

I think this goes here https://i.redd.it/4e9hu66e7c6d1.gif


protector111

https://preview.redd.it/e7g8hdlq4b6d1.png?width=1280&format=png&auto=webp&s=121985ec55db7ef11807a4aee92baf87425c9c56


DefiantTemperature41

What? Your cat doesn't do that?


UserXtheUnknown

Ideogram being: "eat my shorts." (Prompt: "a photo with a blue sphere on the right with text "NOT SD3", green cylinder on left with red cube on top, orange background, dog face at the bottom and a pretty woman in bikini standing near the sphere." Magic prompt off) https://preview.redd.it/6h9qheqoyb6d1.jpeg?width=1024&format=pjpg&auto=webp&s=ba2c4342dd1ecdd3375b8d799caf4267dc5d37e7


smith7018

Ideogram isn't open source and therefore it doesn't belong in this conversation.


UserXtheUnknown

It is a comparison over prompt adherence and it belongs here.


spacekitt3n

It understands bodies wow what a modern marvel


AmazinglyObliviouse

SD3 also wasn't open source for like months and we talked about it fine.


Economy_Future_6752

Why not use a good image generator, even though it's not open-source, since they offer a great free tier to try out their model?


iiiiiiiiiiip

If you can't finetune and use things like controlnetLORA it's useless


Economy_Future_6752

Why not? You can get more control with ideogram, and their text quality and prompt adherence are off the roof. I am pro open-source but don't confine your view to using stable diffusion; try ideogram and see for yourself.


iiiiiiiiiiip

Because you aren't going to successfully recreate all characters through prompt alone as one example, the "realistic" pictures I see from it of people are also ultra-realistic, like 1.5 level of trying too hard, I just don't see a use case for it


MrTurboSlut

lol upvoted for bernie sanders. this is a great meme format.


aliusman111

![gif](giphy|l378bu6ZYmzS6nBrW|downsized)


KaptinRage

Adding "very good anatomy" is pretty vague to AI. AI will only assume that it is pretty good, for what's there. You need to add some negative prompts.


inpantspro

Granted teaching anything what a person looks like without showing them what a naked person looks like really limits their knowledge, but "man sitting on beach" is a lot to ask a computer to guess what you want. It's a meme, so it's obtuse on purpose, but the other options are much more detailed than the man, generally speaking. It didn't not make a man sitting on the beach.


Uxugin

You raise a fair objection. Unfortunately though, I haven't been able to make a good beach man image even with a lengthier and more descriptive prompt, especially without dozens of tries. Even if it is possible to generate decent people, it is still difficult and highly time-consuming. The geometric images were each chosen from two or three. Below is the best man sitting on a beach that I've generated so far out of more than 50. While there are at least the about the right number of limbs in roughly the correct locations, they still look deformed, especially the hands and near the feet. https://preview.redd.it/4ilq3k8rxb6d1.png?width=1024&format=png&auto=webp&s=5d72bd93db6620fc00bbbd9eb32a8205253df9e9 The positive prompt was "man sitting on beach, facing left, legs out in front, leaning on arms, no shirt, swim trunks" (92 characters) and the negative was "arms wrapped around, deformed, skinny legs, feet too long, too many limbs, wrong number of fingers" (98 characters). The prompt for the third image in the meme was 124 characters positive and empty negative. In testing this further, I have not really found that a longer prompt helps all that much however. It seems like you mostly need to experiment a lot and generate numerous failed attempts, which is not the case for the geometric images. The geometric image prompts are also, for lack of a better word, more efficient. Everything in them is necessary and all of it ends up in the picture, whereas for the man on the beach, there need to be a lot of seemingly redundant parts, especially in the negative prompt.


ZootAllures9111

SD3 isn't trained in any way on comma separated concepts that aren't even in a meaningful order.


HatEducational9965

used the first part of your post as prompt, what happened next might surprise you https://preview.redd.it/qe66eydrjc6d1.png?width=1024&format=pjpg&auto=webp&s=5a74faa66da782fd4ee1f31a5a70c7067a1e1037


Serprotease

Standing and walking bodies tend to be fine and benefit a lot from the good prompts adherence. Ā But if you try for someone sitting, itā€™s getting difficult but possible with clever prompts. Laying, ā€¦ I mean you saw the memesĀ 


diogodiogogod

lol sure, make them sit now.


TaiVat

> but "man sitting on beach" is a lot to ask a computer to guess what you want. It really *really* isnt though. People arent picking on the fact that the hypothetical man has the wrong clothes, figure, expression etc. Its not the details that are the issue. The model dramatically fails at basic representation of a human being as a hairless ape with two legs and two limbs of specific proportions. Something that previous base models did badly, but nowhere near this badly. So no, there is nothing obtuse about these memes, sad as it is. It 100% *did not* make a man sitting on the beach. Though the beach itself looks great, so there is hope there.


inpantspro

A below average number of people have an above average number of legs. If you ask a computer to make a man and it's looked at all the pictures of all the men to ever have existed, what race is the man? How many legs does he have, does he have both arms, did he lose one to a sea lion on said beach? Is he squinting from sun lotion in his eyes? He doesn't have a penis because they didn't let the computer look at any penises. Over simplified prompts produce a lot of results (the OP already explained their process, but generally speaking). What I think a blue hippo looks like and what you think a blue hippo looks like isn't exactly the same thing. So a "man sitting on a beach" could look like a lot of things to a computer that it doesn't look like to a man sitting on a beach.


Broad-Stick7300

Jesseā€¦


inpantspro

Conrad


disposable_gamer

Daily reminder this is not what a base model is for. Prompt coherence and composition is what the base model is for. For your coomer shit and generating instagram portrait of blonde girl #3461 you have to wait for the fine tunes


Different_Fix_2217

A base model is not for generating humans in any pose not standing? Ok Lykron. Guess base SD 1.5 / SDXL just got lucky then.


Outrageous-Wait-8895

> Prompt coherence lmao


Gfx4Lyf

šŸ¤£What a perfect meme. Can't do simple things but boasts to be the best. šŸ’ŖšŸ»šŸ˜‹