레이블이 Image Generation AI인 게시물을 표시합니다. 모든 게시물 표시
레이블이 Image Generation AI인 게시물을 표시합니다. 모든 게시물 표시

2025년 2월 17일 월요일

1-2 The Birth and Evolution of Image Generation AI(Stable Diffusion Practical Guide Table of Contents)

1-2 The Birth and Evolution of Image Generation AI

The history of AI image generation is in line with the history of AI researchers and computer-generated images, or computer graphics (CG). Let's look at the flow of development by looking back at major events and changes in our society. Sometimes, topics you've heard of will come up, so I hope you'll increase your understanding by connecting them with that knowledge.

Evolution of Image Generation AI Research (1940-2020)

Key research findings

Flow of AI research

1940

artificial neuron

▼Cello Automata Proposal (Olam & von Neumann: 1940s)
Research on artificial neurons (Warren McCulloch & Walter Pitts: 1943)

1950

Single layer perceptron

▼Created the world's first random linear neural network learning machine, SNARC (Minsky: 1951)
▼The term "artificial intelligence" was coined at the Dartmouth conference (McCarthy: 1956)
▼Simulation studies on pattern recognition by perceptrons are popular (Rosenblatt: 1958)

1970

Multilayer perceptron

▼The AI ​​fad ended when it was discovered that single-layer perceptrons could not identify linearly inseparable patterns (Papert & Minsky: 1969).

▼Live games bring back the spotlight on cellar automata (1970~)
▼Neocognitron (Fukushima: 1979)

1980

Expert System

1990

Error rate propagation method

▼Error-reverse propagation method (Rummelhart: 1986)

2000

Deep Learning

▼Autoencoder (Hinton: 2006)
▼CUDA 1.0 released (2007)

2010

Diffusion model

▼GAN: Generative Adversarial Network (Goodfellow: 2014)
▼VAE: Variational Autoencoder (Kingma: 2014)
▼Diffusion model (Sohi-Dickstein: 2015)

2020

Transformer

In the 1940s, mathematicians and neurophysiologists presented mathematical models that imitated biology, such as artificial neurons and cellular automata. However, at that time, there were no computers that could compute these models, so calculators based on analog electronic circuits were used. Looking at the way calculators were developed at the time, display technology that displayed images was also part of the research on calculator (computing) technology.

Around 1950, technology for drawing dots and vectors, that is, points and lines, on cathode ray tubes was developed in parallel with technology for printing characters, making it possible to create simple shapes or patterns. This technology became the basis for the 'pixel' (pixel) that is used in modern TV and smartphone displays, and it became possible to store red/green/blue (RGB) information in image files such as PNG files, allowing images to be distributed as data on the Internet or in printed materials.

▲The word 'pixel', which is the pixel that makes up an image, was created by a researcher at NASA's Jet Propulsion Laboratory (JPL).

The 1950s and 1970s were the period of the 'first artificial intelligence boom'. Simulations of 'artificial neurons' modeled after the brain and nerve structures using calculators and practical introduction through circuits were actively carried out. Among them, the pattern recognition simulation using 'perceptrons', which are the basis of today's machine learning, was an important event.

Around this time, researchers defined the word artificial intelligence (AI). Furthermore, the concept of artificial intelligence became widely known to the public with the publication of works such as <I. Robot> written by science fiction writer Isaac Asimov and <Astro Boy> drawn by Osamu Tezuka. In the industrial sector, research and development of transistors and integrated circuits were active, TV broadcasting changed from black and white to color, various display devices were released, and visual art and media technology also developed.

Evolution of Image Generation AI Research (1940-2020)

Key research findings

Cultural background related to CG/AI

1940

artificial neuron

▼TV broadcasting begins in the United States (1941)

1950

1960

Single layer perceptron

▼Asimov's <I. Robot> published (1941)
▼Osamu Tezuka releases Astro Boy (1950)
▼Bezier curve presentation (1959)
▼Color TV first broadcast in Japan (1960)
▼The world's first head-mounted graphic display (Minsky: 1963)
▼Artificial brain 'ELIZA' (1964)
▼Movie <2001: A Space Odyssey>(1968)

1970

Multilayer perceptron

▼Xerox Alto features bitmapped GUI and mouse (1973)
▼Release of the movie <Star Wars> (1977)
▼Taito 'Space Invaders' (1973)

1980

Expert System

▼Silicon Graphics, Inc. (1981–2009)
▼Release of the movie <Tron> (1982)
▼Family Computer released (1983)

1990

Error rate propagation method

▼Photoshop 1.0 (1990)

▼NVIDIA established (1993)

▼CG dinosaur movie <Jurassic Park> released (1993)

▼RETAS, the prototype of CLIP STUDIO PAINT, introduced by Toei Animation (1993)

▼PlayStation 1 released (994)

▼'Deep Blue' wins chess (IBM:1997)

▼Movie <The Matrix> released (1999)

2000

Deep Learning

▼PlayStation 2 released (2000)

▼Amazon service opened (2000)

▼YouTube service opened (2005)

▼iPhone released (2007)

▼Hatsune Miku (2007)

▼pixiv service opened (2007)

▼Android early version (2008)

▼MMD early version (2008)

2010

Diffusion model

▼Full transition to Adobe subscription model (2011)

2020

Transformer

In the 1970s, personal computers for general use appeared. Xerox's Alto introduced a visual user interface (GUI) and mouse, laying the foundation for MacOS and Windows that came later. Taito's 'Space Invaders' entered the neighborhood arcade, and 'TV games' boasting various graphics with cathode ray tube TVs and semiconductor devices became available at home.

In the 1980s, Intel developed a fourth-generation computer with a structure identical to the current CPU, and this computer became popularized explosively. In artificial intelligence research, experts in robots, image processing, and control systems took the lead in developing artificial neural networks, and the announcement of the multilayer perceptron and error inverse propagation laid the foundation for today's deep learning.

In terms of industry, knowledge-based AI called 'Expert System' led the 'second artificial intelligence boom' and attracted huge investments.

Following the release of the movie <Star Wars> in 1977, <Tron> was released, and Adobe began selling 'Photoshop', developed by a researcher with deep ties to the American film industry, in the graphics market in 1990. In addition, with Nintendo releasing the 'Family Computer', the game graphics sector grew significantly in both the content and design markets.

The 1990s were also a time of graphics research and market expansion. Blockbuster movies that actively utilized CG, such as <Jurassic Park> and <The Matrix>, were released, and as OSs that emphasized GUIs such as Mac and Windows became widespread, personal computers and software also became popular.

RETAS, the prototype of CLIP STUDIO PAINT, was introduced by Toei Animation, and professional artists also attempted digital drawing using scanners and stylus pens. In 1997, 'Deep Blue', a chess-only supercomputer developed by IBM, won against world champion Kasparov, drawing much attention.

In the 2000s, the era of 3D CG and the Internet arrived. With the release of the PlayStation 2 and the spread of CPUs, services where users create content (UGC) became widespread. YouTube, Nico Nico Douga, and pixiv opened their services, and MMD (Miku Miku Dance), which allows Hatsune Miku and 3D character models to move, was released.

The following 2010s was a time when smartphones became widespread. It was surprising that everyone was carrying a calculator with a high performance that was incomparable to that of the 1940s. And as tools such as Unity and Unreal Engine, which assisted in game development, greatly expanded the market, it became very rare to develop game graphics by programming them from scratch.

Recent Trends in Image Generation AI (2014-2024)

2014

▼GAN: Generative Adversarial Network (Goodfellow)

▼ VAE: Variational Autoencoder (Kingma)

Researchers with great influence

Edgar Simo-serra (Waseda University)

Image understanding, restoration, illustration, sessionization, colorization > 'Style learning'

▼Discriminative Learning of Deep Convolutional Feature Point Descriptors (2015)

▼Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification (2016)

▼Globally and locally consistent image completion (2017)

2015

▼Chainer 1.0 Announcement

▼Diffusion Model (Sohl-Dickstein)

▼TensorFlow Beta Announcement

2016

▼PyTorch 1.0 Announcement

▼TPU: Tensor Processing Unit (2016)

2017

▼TensorFlow 1.0 Announcement (2017)

▼PGGAN (2017)

2018

▼StyleGAN(2018)

Chenlin Me(Stanford->Pika)

▼D2c: Diffusion-Denoising Models for Few-shot Conditional Generation (NeurlPS 2021)

▼Improved Autoregressive Modeling with Distribution Smoothing(ICLR 2021 | SDEdit: Image Synthesis and Editing with Stochastic Differential Equations(ICLR 2022)

▼On Distillation of Guided Diffusion Models (CVPR 2023, Awakandidate)

2019

2020

2021

▼StyleCLIP : Text-Driven Manipulation of StyleGAN Imagery(2021)

Lvmin Zhang @lllyasviel (Stanford University)

<Style2paint-related research tools such as ▼ControlNET, Fooocus, Forge, etc.>

style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN (2017)

▼style2paints(2018)

▼Two-stage Sketch Colorization Learning to Cartoonize Using White-box Cartoon Representations(SIGGRAPH ASIA 2018)

▼controlNet(2022)

▼Fooocus(2023)

▼Stable-Diffusion-WebUI-Forge.(2014)

2022

▼clip2latent
▼DALL-E

▼DALL-E 2
▼Stable Diffusion

2023

▼GPT-4
▼Stable Diffusion XL
▼Stable Video Diffusion
▼SDXL Turbo
▼Japanese Stable VLM

2024

▼Gemini
▼Stable Cascade

▼OpenAI's Image Generation Model Sora Announced
▼Stable Diffusion 3

A turning point has also been reached in artificial intelligence research. The method of scraping massive documents and images from the Internet and learning them through collective intelligence has become widespread. Amazon, which opened its service in 2000, is utilizing a 'recommendation system' that displays product and content recommendations on its shopping site. This technology is a simple algorithm based on statistics called collaborative filtering, and is used on various online shopping sites. In addition, spam mail filtering that applies the probability theory of Thomas Bayes, an 18th century mathematician, has been distributed as open source to numerous mail systems. Thanks to this, the concept of 'collective intelligence and personalization through machine learning' has taken root in many people's minds.

Furthermore, computers themselves have also developed thanks to graphics technology. Previously, the only way to increase CPU processing speed was to increase the operating frequency (clock) or parallelize, but circuit integration had already reached its limit. As a result, games that entrusted calculations to GPUs equipped on video cards capable of pixel processing and vector calculations for 3D CG increased, and GPUs emerged as mainstream devices in PC games and game consoles.

Low-layer CG libraries and shader languages ​​such as DirectX and OpenGL, which allow GPUs to be used as common software between hardware from each company, also contributed to the acceleration of 3D CG processing in games. In addition, a research method called 'GP GPU', which utilizes GPUs for scientific calculations, was announced. Since GPUs are equipped with a large number of simple calculation units, they are advantageous when executing processing that requires high parallelism and calculation density. After that, in 2007, NVIDIA libraryized it as CUDA, and it is still commonly used today. In the field of image generation AI, what is absolutely necessary is a computing infrastructure that can be used by individuals, and this term was also coined at this time.

Meanwhile, until now, graphics research has focused on expression methods that utilize creative algorithms such as toon shading and shader technology, procedural programming technology, which reproduce 'paintings that look like impressionist paintings' or generate pictures like cartoons and animations. There were also many cases where 'goalless tasks' were dealt with, like robots that automatically draw pictures.

In the meantime, machine learning (ML) brought about a major change, and in particular, changes occurred when people studying image pattern recognition joined the CG field. In the past, there was no goal in tasks in the field of graphic expression, but from then on, artificial intelligence learned a large number of images published on the Internet, and built a data set (input data when measuring learning effects with sample data) and evaluation methods as one of the 'machine learning tasks'. Then, cases emerged where learning was repeated, evaluation was performed, a model was built, and images were generated based on inference.

Specifically, it refers to the work of eclipsing objects included in images, handwriting recognition, character pose assessment, smile and age estimation, image-specific assessment, automatic colorization of black and white images, automatic line drawing of sketches, noise removal, missing image supplementation, super-resolution, etc. After that, around 2014, when machine learning methodology was introduced to style learning, variational autoencoders (VAE) and generative adversarial networks (GANs) were announced. Since then, 'generative artificial neural networks' have made even more amazing progress.

From here, as you all know, models that serve as the basis for generative AI have played a leading role since the latter half of 2010.

In 2017, Google's Transformer was announced with a paper titled <Attention is all you need>. This was born from natural language processing, especially translation projects between various languages ​​around the world, and is the core research of current large-scale language models, and is based on the very simple idea that 'attention is all you need'.

Attention is a method of focusing on the parts that are likely to help prediction in data that quantifies the concept of commonality between words when processing natural language. Rather than learning 'language-specific grammar' such as 'Japanese subject -> predicate' or 'English subject -> verb -> object', it focuses on the relevance of individual words (or characters, which are smaller units). This requires a large computational space and countless computations, but once learned, the results of pre-learning appear quickly, so it can be used even in small computer environments. In recent years, as research has continued, Transformers have been applied to various artificial neural network projects that have been promoted so far, and are being used for a wide range of inference tasks such as sentence summary, voice, music, image style, and video.

Looking at projects related to image generation, OpenAI announced DALL-E, a text image generation model, in 2021, and DALL-E 2 was released in a limited manner in 2022. Then, Midjourney, which can generate text images, was announced on Discord, and finally, in August 2022, Stable Diffusion was announced in a form that anyone can download for free while running on the GPU of a regular PC.

Thanks to this, people all over the world began to create and share images using image generation AI tools.

In 2023, this movement accelerated even more, with OpenAI announcing GPT-4, and Microsoft, which was in partnership with OpenAI, introducing GPT and DALL-3, which can generate images, to the search engine Bing. Stability AI, which released Stable Diffusion, subsequently announced Stable Diffusion 2, Stable Diffusion XL, which can express light and space, Stable Video Diffusion, which creates videos, SDXL Turbo, which creates high-quality images at high speed, and Japanese Stable VLM, which can interpret images in Japanese.

Even as we enter 2024, the momentum shows no signs of slowing down, especially in the US graphic tech industry. Google has actively entered the service competition with OpenAI and Microsoft by launching Gemini. Stability AI has also released completely new image generation models, Stable Cascade and Stable Diffusion 3, and OpenAI has also released the image generation model, Sora. This is not just an image, but a cutting-edge large-scale generation model that operates as a 'world simulator'. Of course, many startups are still announcing image generation technology and image generation technology and services.

So far, we have looked at the birth and evolution of image generation AI.

If you are interested and curious, instead of leaving AI as a mystery, it is an era in which you can understand AI and implement the images you want in detail with free applications.

I hope you realize that you are standing at a turning point in history.

Recommended Posts

Love is taking responsibility till the end

  I created a campaign called "Don't Abandon Your Beloved Dog" with Stable Diffusion. I tried creating it several times, but i...