>>>What is ControlNet?
ControlNet is a derivative extension of Stable Diffusion, which was published in the February 2023 paper, <Adding Conditional Control to Text-to-Image Diffusion Modes>. Using this, you can input an image or pose in addition to the prompt to generate complex compositions that cannot be indicated by the prompt alone, poses that are difficult to explain with text, and images that reproduce the characters in the original photo.
The person who created this technology is lllyasviel (zhang Lvmin), the main author of the paper, and as of March 2024, lllyasviel has released the details of the technology and the model on GitHub and Hugging Face.
Github - Lllyasviel / ControlNet-v1-1-nightly
https//github.com/lllyasviel/ControlNet-v1-1-nightly
Additionally, Mikubill has created an extension to allow ControlNet to be used in AUTOMATIC1111 and released it as open source.
Github - Mikubill / sd-webui-controlNet
https://github.com/Mikubill/sd-webui-controlnet
ControlNet is an artificial neural network technology that adds spatial condition control by the diffusion model of Stable Diffusion. It consists of several types of 'preprocessors' such as openpose, which extracts poses from images, and canny, which extracts outlines, and the extracted information is used as condition control for image creation using txt2img. If you use each preprocessor separately according to its purpose, you can control the composition or pose that was difficult to handle in the existing txt2img to create an image as you intended.
>>>Know the difference between img2img and ControlNet
If you think about generating a new image based on an image and a prompt, you might think of img2img, but img2img and ControlNet are completely different technologies.
While img2img identifies the features of the entire input image to generate an image, ControlNet pre-analyzes the information (image, border, pose, depth, etc.) on the input image using a pre-processor, and extracts only the features of the specific elements that each pre-processor is responsible for from the input image to generate an image. For example, you can reproduce only the pose from the input image as shown below.
Original image Image source pexels | Pose of a character extracted with the openpose preprocessor |