List

Chapter 1 Let's learn about image generation AI

Chapter 2 Let's start building the environment

2-1 Let's prepare the environment to use Stable Diffusion
2-2 Let's build the environment using Google Colab
2-3 Let's build the Stability Matrix in the local environment
2-4 Let's create images with simple words
2-5 Download the model
2-6 Download the VAE

Chapter 3 Let's create images with prompts

Chapter 4 Let's create images using images

Chapter 5 Let's use ControlNet

Chapter 6 Let's create and use LoRA

6-1 Let's learn what we can do with additional learning
6-2 Let's create an image using LoRA
6-3 Create your own dedicated painting style LoRA
6-4 Let's create various types of LoRA
6-5 Let's evaluate the learning content

Chapter 7 Let's use the image generation AI more

>>> Understand the role of prompts

As briefly explained in Chapter 2, Stable Diffusion generates images simply by inputting text. The text that gives instructions to the AI in this way is called a prompt. In order to create images as you intended, you need to be familiar with how to write effective prompts.

>>> Consider the components of the image you want to create

First, you need to clarify the image you want to create. It will be convenient to decide on the world view, purpose, and the characteristics of the characters that appear. This time, as an example, let's create an illustration of 'A witch sitting in a flower garden with a castle in the distance, smiling towards this side.' Let's divide it into themes, backgrounds, and other items and write down the elements needed for this illustration.

> Topic information

Girl, witch, blue robe, witch hat, long black hair, sitting, smiling, looking this way, whole body visible

> Background information

Flower field, blue sky, castle, other world

> Camera, lighting, style, composition, etc. information

Fantasy, vivid colors, daytime, clear

In this way, try to express the feelings that come to mind in words as they come to mind. If you can express it in words, you can put even the most trivial details into the prompt, so try to write it down in as much detail as possible. Then, convert it into a prompt.

>>Let's learn the prompt rules

There are some rules for prompts used in SD. The first thing to know is that 1) write in English. 2) insert a comma (,) when separating words or sentences. These are the two rules. For example, if you want to create one girl, you can write Prompt 1girl and create it, but if you want to add one boy, you have to write Prompt 1girl, 1boy.

>>> Let's write a prompt

> Topic information

1girl, witch, blue robe, hat, long black hair, sitting, smile, looking at viewer, full body,

> Background information

flower garden, blue sky, castle,

> Camera, lighting, style, composition, etc. information

fantasy, vivid color, noon, sunny

The above prompts are just examples, so if you are used to it, try separating the words and testing them. For example, Prompt long silver hair can be distinguished as Prompt long hair, silver hair. Also, Prompt looking at viewer written above is an expression that directs the gaze in this direction, and there are also customary expressions unique to prompts. Convert the elements you want to include into prompts and input them.

>>> Add a quality prompt

There are quality prompts such as Prompt masterpiece and Prompt high quality. You can use them to improve the completeness of image creation. However, depending on the model, quality prompts may be unnecessary or even counterproductive. Let's check the model information or compare the generated images to determine whether to use them. Then, let's organize the prompts that have been provided so far and add a quality prompt.

Prompt

masterpiece, high quality, 1girl, witch, blue robe, hat, long black hair, sitting, smile, looking at viewer, full body,

flower garden, blue sky, castle,fantasy, vivid color, noon, sunny,

Where did the high-quality, commonly called 'quality prompt' come from? As previously explained, Stable Diffusion is learning language and image combinations from all over the world through CLIP. More specifically, it trained OpenCLIP using LAION-2B, which learned 5.85 billion 'text-image pairs' filtered by CLIP called LAION-5B as '2.32 billion English words' using a dataset created by a German non-profit organization called LAION.

What's interesting here is that the original data, for example, cat images, are trained only with the tag 'cat' (a word that humans gave the image ALT tag in HTML), and although it is possible to learn with tags such as white, fur, and fluffy, if you rely only on teacher data using such specific words, you will only learn one result.

The ingenuity of CLIP lies in the fact that it is a simple yet massive model that 'when an image is given, selects the most similar text to solve the classification problem' without any task-specific optimization. It has 'zero-shot transferability' (it can respond to distributions that are not in the training data) that allows image classification even without training data.

Therefore, just by giving the prompt 'cat', it can output features that are common to cat images.

What if we attach a tag indicating 'high quality' during learning? High quality does not have clear features such as white, fur, and fluffy. Rather, it may learn 'high quality photos' and 'features of high quality animated character images', and 'features of high quality and realistic cat images'. The features that are commonly applied to these may be 'common aesthetics' that we feel unconsciously. CLIP can be seen as having acquired the concept and relationship of aesthetic elements such as the golden ratio in layout theory or the average female face that always appears when you enter the prompt 'Masterpiece'.

Therefore, the prompt 'high quality' can be said to designate the average of some features that are common to images tagged 'high quality'. CLIP can also output a score for detecting harmful content called NSFW (not safe for work).

>>> Let's verify the order of the prompts

There is an effective order when writing prompts. First, put the content you want to output first and important items as much as possible. It is also important to list them in the order of categories that work effectively depending on the model. The effect of the prompts will vary depending on the model you are using, so let's check it out by actually testing it.

Here, we actually performed a test to generate images using blue-pencil-XL-v.0.0.03.safetensors. We kept all conditions the same except for the prompt, and changed the prompts by four categories: topic Prompt 1girl, environment Prompt castle, quality Prompt masterpiece, and style Prompt anime, and generated images and compared the results. If you want to generate multiple images by changing only some of the conditions for the images to be generated and compare them, use the X/Y/Z plot.

At the very bottom of the Generation tag is the Script menu. From the Script drop-down menu, click to select X/Y/Z plot.

This time, we will change the order of the prompts, so select X type: Prompt order. This is a menu that automatically changes the order of the prompts and compares them. Then, enter the prompts you want to compare as X values: masterpiece, anime, 1girl, castle. Just like when creating a general image, when you press the Generate button, the order of the prompts will automatically change and the images will be created by pattern.

Using X/Y/Z plot, the content applied as a variable to the grid image is recorded, and the generated images are listed side by side for easy comparison and confirmation. This helps to improve the level of image creation. By checking whether the overall degree of image collapse is small, whether it is high quality, and whether the intended feeling is reflected, we can see that the model used this time is effective in the order of quality > style > subject > environment.

Search This Blog

Recommended Posts

이재명 대통령과 상법 개정, 그 의미와 파장

3-1 Use prompts to create images of your own ideas.(Stable Diffusion Practical Guide Table of Contents)