Neural networks are all over the place and have been extremely successfull. Have you ever stopped to think about how a computer learns to see the world as well as a person? How they can recognize your face in smart devices authentication process and furthermore your retina? To do this smart devices use deep learning and computer vision, got interested? Buckle up and let’s go for it!
How an Image is seen by a computer?
Let’s get a handwritten digit from the MNIST dataset, when we pass this to a computer, it actually sees pixels. For example, your image will be broken down into squares and each square has a color associated with it, for instance, if your image is black & white, the color is represented by one layer of values between 0 and 255.
- 0 for a black pixel
- 255 for a white pixel
- Values in the in between form a gray scale
In the same way if you have a colored image, you would have three layers, instead of just one square, you would have three squares for each pixel, the mixture of the three values would be the color of your pixel.
But how a computer uses images?
Suppose now that you have a 2×2 image, what we need to easy up the process is to flatten the image into a one dimensional vector.
And now we enter the world of Neural Networks. Suppose we have a MLP ( Multilayer Perceptron ), that classifies images between Cat, Dogs, Turtles and Birds. Next we get our 2×2 image of a turtle and feed to our MLP in the input layer, calculations are made in the hidden layers and the output layer shows us the probability of each class.
You can see how the process of learning from a dataset of images is made here, and if you are new to the field and got interested, you can start with this amazing post by Cassie Kozyrkov here. And if you want a hands on introduction you can go for it here. If you want more content like this, I have a YouTube Channel