Hi there,
I’m a complete amateur in design and painting, but I got kinda hooked on Stable Diffusion, because it (theoretically) lets me create images out of my fantasy without needing the digital painting skills.
I poured a couple of weekend free time into learning how to use SD and by now I’m somewhat familiar with how to make useful prompts, how to use Control Net, Inpainting and Upscaling.
But now I’m a bit at a loss on how to further perfect my workflow, because as of right now I can get really good images that kinda resemble the scene I was going for (letting the model / loras do the heavy lifting) or I’m getting an image that is composed exactly as I want (utilizing control net heavily) but is very poorly executed in the details with all sorts of distorted faces, ugly hands and so on.
Basically, if I give a more vague prompt the image comes out great but the more specific I want to be, the more the image generation feels “strangled” by prompt and control net and it doesn’t seem to result in usable images …
How do you approach this? Trying to generate 100’s or more images in the hope that one of them will get your envisioned scene correctly? Or do you make heavy use of Photoshop/Gimp for postprocessing (<- I want to avoid this) or do you painstakingly inpaint all the small details until it fits?
Edit: Just to add a thought here: I just started to realise how limited most of the models are in what they “recognise”. All our everyday items are covered pretty well, e.g. prompting “smartphone” or “coffeemachine” will produce very good results, but things like “screwdriver” are getting dicey already and with special terms like “halberd” it is completely hopeless. Seems I will need to go through with making my own lora as discussed in the other thread …
Very solid run down, and I agree with most of this.
In particular - Latent couple and composable LORA are amazing tools and the OP should definitely look into them.