For a given perceptron, its weights and thresholds are also given, representing a decision-making strategy. Therefore, we can change this strategy by adjusting weights and thresholds.

Regarding the threshold threshold, one thing that needs to be pointed out here is that for the convenience of expression, it is generally expressed by its opposite number: b=-threshold, where b is called bias (bias).

In this way, the previous rule for calculating the output is modified to: if w1x1 + w2x2 + w3x3 +...+ b \u003e 0, then output output=1, otherwise output output=0.

And the weight w1=w2=-2, then b=3.

Obviously, only when x1=x2=1, output=0, because (?2)*1+(?2)*1+3=?1, less than 0. In the case of other inputs, output=1.

So in reality, this is actually a "NAND gate"!

In computer science, the NAND gate is a special one among all gate components, which can express any other gate components in combination. This is called the universality of the NAND gate (Gate Universality).

Since a perceptron can express a NAND gate by setting appropriate weight and bias parameters, it can theoretically express any other gate component.

Therefore, perceptrons can also be connected to each other to form a computer system like the previous three-body example.

But that doesn't seem like much of a surprise, we already have computers, it just complicates things.

There is a limit to what a single perceptron can do. To make complex decisions, it is necessary to connect multiple perceptrons.

However, the actual network may have tens of thousands or even hundreds of thousands of parameters. If these parameters are manually configured one by one, I am afraid that this task will never be completed.

The most distinctive feature of the neural network is here.

Instead of specifying all parameters for the network, we provide training data, let the network learn by itself during training, and find the most appropriate values ​​for all parameters during the learning process.

The general idea of ​​operation is this: we tell the network what the output we expect when the input is a certain value, and each piece of training data like this,

Called the training sample (training example).

This process is equivalent to when the teacher is teaching students some abstract knowledge, give a specific example:

Generally speaking, the more examples we give, the better we can express that abstract knowledge. This is also true in the training of neural networks.

We can pour thousands of training samples into the network, and then the network will automatically summarize the abstract knowledge hidden behind them from these samples.

The embodiment of this knowledge lies in the values ​​of all weights and bias parameters of the network.

Assuming that each parameter has an initial value, when we input a training sample, it will calculate the only actual output value based on the current parameter value.

This value may not be the same as our expected output value. Imagine, at this time, we can try to adjust the value of some parameters to make the actual output value and the expected output value as close as possible.

After all the training samples are input, the network parameters are also adjusted to the optimal value. At this time, the actual output value and the expected output value of each time are infinitely close, so the training process is over.

Assuming that during the training process, the network has been able to give correct (or close to correct) responses to tens of thousands of samples, then input a piece of data it has not seen before, and it should also have a high probability of giving us expected decision. This is how a neural network works.

But there is still a problem here. During the training process, when there is a difference between the actual output value and the expected output value, how to adjust each parameter?

Of course, before thinking about how to do it, you should also figure out: Is this method feasible to obtain the desired output by adjusting the parameters?

In practice, this approach is basically infeasible for perceptron networks.

For example, in the perceptron network with 39 parameters in the above figure, if the input remains unchanged and we change the value of a certain parameter, the final output is basically completely unpredictable.

It either changes from 0 to 1 (or from 1 to 0), or it may remain the same. The key to this problem is: both input and output are binary and can only be 0 or 1.

If the entire network is regarded as a function (with input and output), then this function is not continuous.

Therefore, for training to be possible, we need a neural network whose input and output are continuous over the real numbers. Thus, sigmoid neurons appeared.

The sigmoid neuron (sigmoid neuron) is the basic structure often used in modern neural networks (certainly not the only structure). It is similar in structure to a perceptron, but with two important differences.

First, its input is no longer limited to 0 and 1, but can be any real number between 0 and 1.

Second, its output is no longer limited to 0 and 1, but the weighted sum of each input plus the bias parameter is calculated as the output through a calculation called the sigmoid function.

Specifically, assuming z=w1x1+w2x2+w3x3+...+b, then output output=σ(z), where: σ(z)= 1/(1+e-z).

σ(z) is a smooth, continuous function. Moreover, its output is also a real number between 0 and 1, and this output value can be directly used as the input of the next layer of neurons, which is kept between 0 and 1.

It is conceivable that after assembling the neural network with sigmoid neurons, the input and output of the network become continuous, that is to say, when we make a small change to the value of a parameter, its output only produces Minor changes. This makes it possible to gradually adjust the training of parameter values.

Historically, many researchers have tried, and this example was also mentioned in Michael Nielsen's book "Neural Networks and Deep Learning".

This neural network has only one hidden layer and belongs to shallow neural networks. And the real deep neural networks (deep neural networks), then there will be multiple hidden layers.

The neuron system is designed and manufactured using the design method of the left and right brain hemispheres.

The far right is the output layer (output layer), which has 10 neuron nodes, which respectively represent the recognition results of 0, 1, 2, ..., 9. Of course, limited by the sigmoid function σ(z), each output must be a number between 0 and 1.

After we get a set of output values, which output has the largest value will be the final recognition result.

During training, the output form is: the correct number output is 1, the other output is 0, and the hidden layer and the output layer are also fully connected.

The total weight parameters of the neural network are 784*15+15*10=11910, the bias parameters are 15+10=25, and the total number of parameters is: 11910+25=11935.

This is a very astonishing number.

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like