레이블이 artificial neural networks인 게시물을 표시합니다. 모든 게시물 표시
레이블이 artificial neural networks인 게시물을 표시합니다. 모든 게시물 표시

2025년 2월 18일 화요일

1-4 Let's learn about artificial neural networks(Stable Diffusion Practical Guide Table of Contents)

1-4 Let's learn about artificial neural networks

Current AI technology requires artificial neural networks and machine learning using them. Here's a detailed explanation.

Compared to past AI, current AI is characterized by being based on machine learning technology that operates at high speeds using large-scale computational resources and large-scale data sets. Here, we'll learn about the basic knowledge related to artificial neural networks, which are the core of this technology, and cutting-edge deep learning.

>>> What is an artificial neural network?

The artificial neural network, a central concept in modern AI, is one of the technologies that allows computers to learn, predict, and judge. It is a mathematical model that reproduces the structure of the human brain processing information. Just as the brain's nerve cells transmit signals, the artificial neural network also transmits signals by connecting numerous 'nodes (or neurons)'.

In the brains of all living things, including humans, there are countless nerve cells called neurons, and they exchange information with each other. It is said that if all the neurons and synapses in a person's body were connected, it would be about 1 million kilometers. Neurons receive or process information from other neurons and transmit information (electrical signals -> neurotransmitters) to other neurons, and the junction where this information transmission occurs is called a 'synapse'. At this time, the 'degree of synaptic coupling' is the strengthening or weakening of the information transmission form (left -> right in the figure below on the left), and this phenomenon of changing in response to stimulation can be said to be the flexible memory and learning mechanism of living things.

Artificial neural networks reproduce the neurons and synapses of the human brain on a computer. Inside the computer, units called 'nodes' act as neurons, and they are connected by being given weights. Bias refers to which value to multiply the output when the input is 0. A large set of 'weights and biases' becomes the network model, and the process of adjusting this and evaluating the input stimulus and output result is model learning.

>>>Learning by backpropagation

One of the methods for training and adjusting a model is 'backpropagation'. This refers to the process of adjusting the weights and biases so that the network can derive the correct answer. The error with the answer derived by the artificial neural network is calculated, and the weights of the entire network are slightly modified in the direction of reducing the error. The loss function for the network, that is, the function that indicates 'how accurate the output value is' is determined. There are four steps in the backpropagation method as follows.

(1) Forward propagation: Input values ​​are input to the network and calculations are performed at all layers to obtain the final output. At this point, the weights of the network are random and the values ​​from previous learning are used.

(2) Error calculation: The difference between the output of the network and the actual correct answer is calculated. This difference is called the 'error'. The loss function is used to evaluate the error, and the numerical value is expressed as how accurate or wrong the prediction is.

(3) Backpropagation: The calculated error is propagated backward from the output layer to the input layer. Calculate the 'responsibility' of the error for the weights of each layer and find out how much the weights should be adjusted to reduce the error.

(4) Weight update: Update the weights of the network to minimize the error. At this time, an optimization method called stochastic gradient descent is used. The learning rate parameter determines the amount to adjust the weights. If the learning rate is too high, learning becomes unstable, and if it is too small, learning takes a long time.

As a result, the weight (W) indicates the contribution of each input signal. This allows us to know how much each input item affected the output of the neuron. The more important the input signal is, the greater the weight associated with that input. In short, the weight plays a role in controlling the importance and influence of the input signal. And the bias (B) corresponds to the threshold of neuron activation. Depending on the bias, the degree to which the neuron inputs the output is controlled. Even if the input limit is close to 0 or very small, if there is a bias, the neuron can become a model that activates.

>>> Multilayer structure of artificial neural network

The simple movement of neurons was explained earlier, but general artificial neural networks have a multilayer structure and are largely divided into three roles. First, the input is the part that receives data. For example, in the case of image recognition, the pixel value of the image is the input. Second, there is one or more hidden layers, and they perform complex calculations and feature extraction. Then, they process the data from the input layer and send it to the output layer. The third is the output layer, which outputs the final result and prediction. For example, when classifying dogs and cats, the output layer displays the prediction result of whether it is a dog or a cat. Learning with a network that has numerous hidden layers is called 'deep learning', and it can learn more complex features and patterns.

Deep learning methodologies have been published countless times. Let's look at the actual processing process of a convolutional neural network (CNN) that can recognize handwritten images, as one of them. A convolutional neural network is a multilayer neural network that determines and outputs the corresponding character from among the candidates with multiple labels when an input image is entered as a character on the left side of the network.

Image Source : https://towardsdatascience.com/the-math-behind-convolutional-neural-networks-6aed775df076/

In artificial neural networks, convolution is the mathematical operation of combining two functions to create a new function. This process is widely used in many fields related to signal processing, image processing, statistics, AI image generation, etc. Convolution is the process of calculating the sum of the products of a kernel (or filter), which is a grid-shaped numerical data that emphasizes contrast or detects edges, and a partial image (called a window) of the same size, and converting it into a single value. If the window is gradually moved back and forth and the conversion process is performed, it can be converted into a small grid-shaped numerical data, which is called a 'tensor' and represents the image characteristics.

Going back to the example of handwriting character recognition, the original 'image with a size of 32X32' is passed through a '28X28X20 convolution layer'. This convolution layer is called the input layer, and the filter (kernel) repeatedly moves over the image like a window and processes all pixels. Then, multiply each pixel element and output the sum of the results (sum of products). Then, slide the filter by a certain number of pixels (stride) and repeat this process over the entire input data. In this way, a feature map (or activation map), which is a tensor representing the response of the filter to the very first input data, is generated. The feature map is the 'spatial distribution' of the features corresponding to the filter when data is input. In short, in the case of handwriting character recognition, it can characterize and evaluate 'where and what lines exist' for various image inputs.

GitHub - Machine Learning for Art/Demo:Artificial Neural Network

https://ml4a.github.io/demos/forward_pass_mnist/

At this time, the purpose of the convolution layer is to extract features, that is, to detect which part of the input data a specific characteristic (edge, texture, color, etc.) exists in, but the main purpose of the following 'pooling layer' is to 'reduce the size of the feature map' extracted by the convolution layer. In this layer, the feature map is divided into small areas and the maximum (max pooling) and average (average pooling) are selected from each area. It is similar to blurring the image. This reduces the size of the output feature map and reduces the amount of calculation. In addition, pooling grants the model invariance to minor position changes. Then, even if the line in the image moves or rotates slightly, it can be recognized as the same image.

▲ Example of CNN configuration for evaluating hand-drawn images. Source: DeepAge 'Understanding standard convolutional neural networks from 0' https://deepage.net/deep_learning/2016/11/07/convolutional_neural_network.html

After repeatedly applying 'feature dimension reduction' and 'invariance acquisition' in this way, the feature map is finally passed to the pre-connected layer. The pre-connected layer determines which class the image belongs to based on the features extracted so far. In the handwriting character recognition example mentioned above, the Japanese letters 'a' and 'me' can be identified, and in the case of image classification, dogs, cats, people, etc. At this layer, the weights that connect each input to all outputs can be obtained, and 'how close' each class is can be expressed as a similarity. If the features extracted in this way undergo final classification and regression processing, accuracy and stability can be improved.

▲Source: DeepAge 'Understanding Standard Convolutional Neural Networks from 0' (https://deepage.net/deep_learning/2016/11/07/convolutional_neural_network.html)

It is also possible to connect these artificial neural networks in multiple ways. For example, in the case of a task called image classification, after inputting a general animal image (dog, cat, giraffe, person, etc.), an artificial neural network that shows a strong reaction when recognizing a human face can be connected to a network called smile, angry face, age estimation, and facial expression classification to build an image recognition network that seems to read human emotions.

This multilayer structure of artificial neural networks will also be utilized in Stable Diffusion, which combines three large structures with different roles to perform the task of 'generating images from text'. However, it would be difficult to understand the image generation process with only an example showing the classification process. The following explains the latent diffusion model.

Recommended Posts

Love is taking responsibility till the end

  I created a campaign called "Don't Abandon Your Beloved Dog" with Stable Diffusion. I tried creating it several times, but i...