I just wanted to share some of my first impressions while using SDXL 0.9. And a random image generated with it to shamelessly get more visibility. I would like to see if other had similar impressions as well, or if your experience has been different.

  • The base model when used on its own is good for spatial coherence. It basically prevent the generation of multiple subjects for bigger images. However the result is generally, “low frequency”. For example a full 1080x1080 image is more like a lazy linear upscale of a 640x640 in terms of visual detail.
  • The detail model is not good for spatial coherence when starting from a random latent. When used directly as a normal model, results are pretty much like those we get from good quality SD1.5 merges. However since it has been co-trained to use the same latent space representation; so we get the power of latent2latent in place of img2img upscaling techniques.
  • The detail model seems to be strongly biased and will affect the final generation. From what I can see all nude images in their training set are “censored” in the sense that they hand picked high quality photos of people wearing some degree of clothing.
  • While the two models share the same latent space, they do not converge to the same image in generation. A face generated with the first model will be extremely affected by the latent2latent details injection phase. As I said, I found the detail model very biased, which is potentially a big problem in generation: for example all faces I tried to generate will converge to more “I am a model” ones, often with issue capturing a specific ethnicity. I can see this being a bit of a problem in training LoRa.

What are your experiences? Have you encountered other issues? Things you liked?

  • wsippel@kbin.social
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    SDXL 0.9 seems absolutely amazing so far. It’s so much better at following instructions than any other SD foundation model it’s not even funny, and it can to tons of stuff out-of-the-box that would require at least an embedding with SD1.5. One thing I immediately noticed is that it handles color instructions properly most of the time. You can define tons of object colors, and it’ll usually only color the specified or undefined objects. I also tried things like character in a dirty environment. SD1.5 and its finetunes would often make the character dirty, SDXL follows the instruction properly. Incredible potential.

    When it comes to the refiner, I found that the recommended(?) 0.25 strength works well for environments and such, but for characters, it should be dialed way down. I still use it, at around 0.05, and that seems to do the trick. It still does what it’s supposed to at such a low strength, it still has a profound effect on fine detail like hair, but it doesn’t completely change the base generation nearly as much.

    • karurochari@lemmy.dbzer0.comOP
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      1 year ago

      Yes, I had to tune it down as well.
      I actually ended up with a different workflow from that which was suggested, as I think it is a bit too wasteful. Instead of generating the full image and using latent2latent to introduce new noise from the final version, I stop the generation at an intermediate step and finish it with the refiner model. I did it in the past to combine different sd1.5 checkpoints, and it does work here as well, since the latent space is shared across the two models.

      I added an image with the alternative workflow in case someone wants to try it (hopefully metadata are preserved).

  • pablonaj@feddit.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    1 year ago

    Hey, thanks for sharing your impressions.

    I haven’t tested it yet. Are you using it with comfy UI?

    From what I understand both models could be trained so the current bias may not be a big issue once people start customising them.

    Have you used the bot on discord? If yes, are you finding any big differences between this and the bot? How’s it at following prompts?

    • karurochari@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      1 year ago

      Yes, I think this kind of “explorative evaluation” would not be possible in automatic1111.
      From what I recall, it does not really give much control over the generation pipeline to the final user.
      Admittedly it has been a while since I last used it, and I have no idea how good of flexible the SDXL integration is.

      From what I understand both models could be trained

      Yes, that is also my understanding.
      Compared to the original SD1.5 it has so much potential for further extension, I am also confident many of these issues can be ironed out by the community.
      And out of the box, base SDXL is very much better than base SD1.5, I am quite positive about that 😄 .

      No, I never used the bot on discord.
      As for the prompts, I still need to really understand how to best write them.
      So far, I mostly used the same style I adopted in SD1.5 (without the Danbooru tags since they are clearly not supported).
      I tried to be a bit more “expressive” but I have not really seen much of an improvement.
      And words are still “bleeding”, so red eyes will often generate extremely red lips, or red clothes.

      • pablonaj@feddit.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        I had been using the bot in discord a lot for a while. The prompting is quite different to what it was in 1.5 and 2.1, you should check the Pantheon channel in discord for a good idea of the prompts that gave good results. Basically my experience is that it works very well with simple prompts, and if you want you can also get very descriptive and will get good results. Negatives are almost not needed in the bot, but I think this may be because they were injecting them already, so the 0.9 that was leaked may need them.