Skip to main content

Convolutional Neural Network(CNN)

Convolutional Neural Network(CNN)

     Have your ever wondered how your eyes can see so many images with so many colours in real time and not only that but it can store in your brain and if it's a car coming near you than it can even make you move.All because of your eyes, miraculous!
    
     It uses a technique that is almost similar to all mammals' visual cortex dogs,cats,cows,lions etc.that technique is described in computer science as "convolutional neural network(cnn/convnet)" or "space invariant artificial neural networks (SIANN)".Today in this article we will discuss history of cnns,inspiration behind it,logical steps that are used in it,basic working of a sample cnn,some use cases of cnn.So let's begin.

  History of cnn

    Some of the initial work in this field started in 1950-60 in US by Hubal and Weisel.Their method was called Receptive fields.This idea layed groundwork for years to come in fields and inspired other ideas like neocognition,shift-variant neural network,lenet, etc.But big mic drop happened when Yan Lee Cun(current director of AI research at Facebook) released gradient based learning applied to document recognition in 1998.After this many orgs like post offices,,banks,telecom companies started using this technology for recognizing digits and characters.After that it has shown exponential growth and given many good researches like imagenet,alexnet etc.And now we use this technology every day in our phones while using socialites or unlocking phones and many other activities.
To know more of it's history click here and read papers included with it.

  Inspiration behind cnn

    In mammalian visual cortex a very complex system is operated in small chunks by different parts of visual cortex and by varying complexity of cells.For example Hubal and Wiesel identified to types of visual cells in human brain.
1. Simple cells this type of cells recognize orientations of different edges of object,which means these cells create a '3D wireframe' for given object.
2. Complex cells this type of cells get input from many simple cells so it also recognizes orientations nd edges but it doesn't care about location of those edges.

    As scientists understood how our visual cortex works they started doing experiments of mimicking this complex task of 'seeing' using computers. But back in 1950s they didnt have computing power nor data hence we have to wait for almost another 50 years until Yan lee Cun and others published the paper and they actually made a working model for recognizing handwritten digits using the research that had been been done before and applying it for computer's eyes!


  How does it work?

Grayscale vs. Binary Processing
  First the network sees the image in three dimensional matrix form(height x width x depth where depth can be three in case of RGB image for describing redness,greenness and blueness of given pixel and only one in case of grayscale image where every cell describes blackness of given pixel making it a 2d matrix)

Then it passes thes values to a convolutional block.Every such block consists convolution layer,pooling layer and normalization layer in varying numbers.There are more than one such blocks in one cnn and they process the data.


Once that is done we get and output matrix and now it is flattened into a one dimensional vector.then we apply softmax function to this vector and create probabilities of every element and the class with max probability is our answer.


It loooks something like this:

    
    Logical steps that are followed in cnn

     Step 1: Prepare a dataset to train from

There are plenty of open datasets to get labeled images from as given below:
CIFAR10,CoCo,Imagenet etc.


     Step 2: Doing the convolution of matrix



    Now in this step we take an n x n weight matrix and we Convolve(take a bucket of red colour and one of blue,mix both of them and you just convolved a new colour.That's called convolution) through whole matrix with our weight matrix as shown in above gif.And we get new matrix called kernal.

     In this case we find the curve that matches our trained weight matrix and so we get dot product of two matrices a large number



    In this case we do not find any non-zero values that would point at our anser so we get dot product of two matrices to be zero.


Step 3:Pooling of the kernel

In this step we reduce the dimension of our kernel with various techniques of pooling like max pooling.It is also called downsampling.




     Step 4: Normalization

    Once we are done with pooling we have to get all values of matrix in a specific range to be processed by our network.The range depends on problem and so does the method for doing so.There are several methods like Softmax,ReLU,tanh(), etc.we use ReLu when we have to get only positive values and ReLu makes all negative values zero.We use softmax when we need probability of every class.

    Step 5: Regularization

   When we are training our model with some specific data it is possible that after training when we deploy our model to be used in real world data it doesn't give accuracy it gave when training because it is to much accustomed to daa that we gave it and now can not generalize on new data.This problem is described as overfitting in machine learning.And in order to avoid that in our neural net we use a technique called regularization.
 It looks like this:
{\displaystyle \min _{f}\sum _{i=1}^{n}V(f(x_{i}),y_{i})+\lambda R(f)}

    where the term with lembda is regularizer term.There are several techinques that can be used for regularization in cnn and the one proposed by Jeofry Hinton which is called Dropout is used widely.

In this method we randomly disable different neurons to force our network for learning new ways to classify object correctly and increase accuracy.It looks like this:
Deep learning – Convolutional neural networks and feature ...


   
    Step 6: Probability conversion

     Now, finally we have our output matrix and we will predict our answer from this.First we will vectorize our matrix.Then we will apply softmax function to convert outputs to probabilities.

\sigma (\mathbf {z} )_{j}={\frac {e^{z_{j}}}{\sum _{k=1}^{K}e^{z_{k}}}}

    Step 7: Choosing the right class

     In this last step we will choose the class which has maximum probability using argmax function.

    Use cases for cnn

     When in the problem that is being solved position of rows and collumns doesn't matter(customer database) cnn shouldn't be used.Ans when interchanging rows and collumns if we get completely new data(images,videos,sound etc) cnn can be of good use.

Few examples being:
1. Object recognition in image or video
2. image and video generation using GAN(generative adverserial network)
3. pattern detection in audio or generating audio from pre loaded pattern
etc.


So that's what cnns are and their working.here are some good reads and vids.

Convolutional neural nwetworks explained by Siraj Raval Subscribe this chanel for cool videos

     End note

     If you found this article helpful share this with your friends on fb,twitter,linkedin.And if you have any doubt or feedback put a comment below.I wish good you luck on your learning journey


ARTICLE BY : DHRUMIL BAROT 

Comments