hello friends! new(ish)!

Stable Diffusion: Difference between revisions

From InstallGentoo Wiki v2
Jump to navigation Jump to search
>WeeabooFromHell
>WeeabooFromHell
Line 124: Line 124:
* [https://github.com/rom1504/clip-retrieval clip-retrieval]: Project that lets you determine the relationship between images and keywords, works in either direction. Online version [https://rom1504.github.io/clip-retrieval/ here]
* [https://github.com/rom1504/clip-retrieval clip-retrieval]: Project that lets you determine the relationship between images and keywords, works in either direction. Online version [https://rom1504.github.io/clip-retrieval/ here]
* [https://mega.nz/folder/oRM1xAAJ#MeZYuu-lkKNMC3fgrvhsmw Archive of samples produced by individual keywords]
* [https://mega.nz/folder/oRM1xAAJ#MeZYuu-lkKNMC3fgrvhsmw Archive of samples produced by individual keywords]
* [https://artsandculture.google.com Google Arts & Culture]: can be used to discover [https://artsandculture.google.com/category/artist artists], [https://artsandculture.google.com/category/art-movement art movements], [https://artsandculture.google.com/category/medium mediums], etc.
[[Category:Software]]
[[Category:Software]]
[[Category:HowTo]]
[[Category:HowTo]]

Revision as of 05:49, 25 August 2022

Stable Diffusion is an open-source diffusion model for generating images from textual descriptions. Note: as of writing there is rapid development both on the software and user side. Take everything you read here with a grain of salt.

How to Use

Usage instructions for both online and local use.

Getting started

gradio

gradio is a graphical user interface for generating images locally with Stable Diffusion. A short explanation of what the options for txt2img do:

  • Prompt: textual description of what you want to generate.
  • Sampling Steps: diffusion algorithms work by making small steps from random noise towards an image that fits the prompt. This is how many such steps should be done. Diminishing returns.
  • Sampler: which sampling algorithm to use, use k-diffusion if you're unsure.
  • Skip sample save: when ticked, do not save individual images to disk.
  • Skip grid save: when ticked, do not save a grid of all images at the end.
  • Increment seed: when ticked, explicitly set the seed with each generation iteration. This makes it possible to recreate a specific image that you encounter in a larger run.
  • DDIM ETA: amount of randomness when using DDIM.
  • Sampling Iterations: how often to generate a set of images.
  • Samples Per Iteration: how many images to generate at the same time. Increasing this value can improve performance but you also need more VRAM. Total number of images is this multiplied with Sampling Iterations.
  • Classifier-free Guidance Scale: how strong the images match your prompt. Increasing this value will result in images that resemble your prompt more closely (according to the model) but it also degrades image quality after a certain point.
  • Seed: starting point for RNG. Keep this the same to generate the same (or almost the same) images multiple times.
  • Width: width of individual images in pixel. To increase this value you need more VRAM. Image coherence on large scales becomes worse as the resolution increases.
  • Height: same as Width but for individual image height. The aspect ratio influences the content of generated images; if height is higher than width you get for example more portraits, while you get more landscapes if width is higher than height.

Example Prompts

Cherrypicked result (best of 9)

Baseline prompt for a photorealistic drawing of the face of a conventionally attractive woman:

thick lips, black hair, fantasy background, cinematic lighting, highly detailed, sharp focus, digital painting, art by junji ito and WLOP, professional photoshoot, instagram

Prompt Design

Guidelines for creating better prompts.

What To Write

Write text that would be likely to accompany the image you want. Typically this means that the text should simply describe the image. But this is only half of the process because a description is determined not just by the image but also the person writing the description.

Imagine for a moment that you were Chinese and had to describe the image of a person. Your word of choice would likely no longer be "person" because your native language would be Chinese and that is not how you would describe a person in Chinese. You wouldn't even use Latin characters to describe the image because the Chinese writing system is completely different. At the same time, the images of people that you would be likely to see would be categorically different; if you were Chinese you would primarily see images of other Chinese people. In this way the language, the way something is said, is connected to the content of images. Two terms that theoretically describe the same thing can be associated with very different images and any model trained on these images will implicitly learn these associations. This is very typical of natural language where there are many synonymous terms with very different nuances; just consider that "feces" and "shit" are very different terms even though they technically describe the same thing.

TLDR: when choosing your prompt, think not just about what's in the image but also who would say something like this.

Prompt Length

Be descriptive. The model does better if you give it longer, more detailed descriptions of what you want. Use redundant descriptions for parts of the prompt that you care about.

Note however, that there is a hard limit regarding the length of prompts. Everything after a certain point - 75 or 76 CLIP tokens depending on how you count - is simply cut off. As a consequence it is preferable to use keywords that describe what you want concisely and to avoid keywords that are unrelated to the image you want. Words that use unicode characters (for example Japanese characters) require more tokens than words that use ASCII characters.

Punctuation

Use it. Separating keywords by commas, periods, or even null characters ("\0") improves image quality. It's not yet clear which type of punctuation or which combination works best - when in doubt just do it in a way that makes the prompt more readable to you.

Emphasis

The common wisdom is that putting a keyword in square brackets or appending an exclamation mark increases its effect while putting a keyword in round brackets decreases its effect; Using more brackets or exclamation marks results in a stronger change. However, when this was tested with simple test prompts this effect could not be observed. Specifically, someone made short, simple test prompts that specify two different things and tested how the image changes if one of those things is strengthened with [] while the other thing is weakened with (). The test cases were flowers being red or blue and a woman being a doctor or a vampire. The specific prompts and samples are in the samples archive linked below.

The repetition of a certain keyword did work to increase its effect.

Image Content

If you want your image to contain specific things: the less abstract your wording is the better. If at all possible, avoid wording that leaves room for interpretation or that requires an "understanding" of something that is not part of the image. Even concepts like "big" or "small" are problematic because they are indistinguishable from objects being close or far from the camera. Ideally use wording that has a high likelihood to appear verbatim on a caption of the image you want.

Miscellaneous

  • Unicode characters (e.g. Japanese characters) work.
  • Capitalization does not matter.

Keywords

The most reliable way to find good keywords is to look at the keywords that are used to generate images that are similar to what you want. Below are some (unconventional) known good keywords (as determined by using keywords as prompts without other keywords or in very short and simple prompts). The underlying assumption is that the keywords will also be good as part of large prompts; if they are not, please provide feedback.

Weebshit

Anime and other Japanese things:

  • "anime": generic, mediocre anime-style images, looks somewhat like the 2000s. Since "anime" is associated with many low-quality images a common strategy is to just specify a drawing and use Japanese words in your prompt to associate your prompt with what a Japanese person would be likely to draw (i.e. anime). For style variations try "アニメ" (Japanese way to write anime, looks more modern), "chibi", "Kyoto Animation", "light novel illustration", "shonen", "Studio Ghibli", "visual novel CG", or "Yusuke Murata" (artist of the One-Punch Man manga). Avoid "manga", "tankobon", and "waifu". Order of keywords is simply alphabetical.
  • "ikemen": handsome Japanese men. Avoid "イケ面" (Japanese spelling).
  • "Gothic Lolita": frilly black dresses.
  • "oneshota": cute anime boys.
  • "Sweet Lolita": frilly pink dresses.
  • "Touhou", "Touhou Project": characters from the franchise. Avoid "東方".
  • "waifu": modern Japanese women.
  • "Zettai Ryouiki": short skirt in combination with stockings or socks, visible thighs. Avoid "絶対領域" (kanji spelling).
  • "美女", "美人": Japanese women, classical beauty standard.
  • "巨乳", "爆乳", "おっぱい": Japanese women with large breasts, either topless or wearing a bra.

Miscellaneous

  • "bobs and vagene": do not redeem the prompt
  • "E=mc2": Albert Einstein.
  • "r/aww": cute animals.
  • "r/Fitness": muscular women.

Useful Links