Convolutional Neural Networks (ConvNets) have been a rage in the recent times amongst modern day research and development community to try and ease various applications. Right from image processing to pattern recognition and classification, ConvNets have been experimented with a variety of domains and has been providing some excellent results. ConvNets are basically a modified version of multi-layer perceptron designed such that they require very less processing as compared various other networks. Being a part of Deep Learning, its architecture contains a humongous number of hidden layers between the input and the output one to provide ultimate optimization. Its hidden layers mainly consist of multiple or combination of each of the following most popular layers- Convolutional, Rectified Linear Unit (ReLU), Pooling and Fully Connected. Not going in too depth and mathematics, the above mentioned layers are explained in layman’s terms for simplicity and better understanding. Convolutional Layers are the core layers which perform most of the computational work. They use a variety of different dimensional masks over the required image where each mask involving a specific feature of the image thus imitating small patterns in an image and mapping to them. Each filter gives a respective convolved image. Thus, a single image is split into ‘n’ number of filtered images stacked together where each layer is connected to the other. Stacking such numerous filtered images gives us the convolutional layer. The ReLU layer performs the normalization process. It is used to introduce non-linearity into our ConvNets since most of the real life information i.e. data with which our network would be training will be non-linear. This ReLU operation is performed at the pixel level transforming each negative value to zero. We could also use other non-linear functions such as hyperbolic tangent (tanh) or sigmoidal function but ReLU has found to produce better results for most of the cases. Next comes the Pooling Layer. It basically shrinks the image to a more essential form. Max-Pooling is one of the most popular layers used. It takes the maximum or the most prominent feature out of the block of neurons of the previous layer. On the same note, Average Pooling involves taking out the average value from the neurons cluster of the previous layer. The above 3 layers are repetitively cascaded after each other as per the requirement which is also known as Deep Stacking. Fully Connected Layer connects the each and every neuron of a layer with neurons in the next layer. It is very similar to multilayer perceptron neural network. The Deep Stacked layers contain high-level features of the input image. The Fully Connected Layer makes use of these features to classify the image into different classes based on our training dataset. The sum of the output probabilities of this layer is 1. This is done by the Softmax Activation Function. This activation function takes a vector of real-valued score and compresses it to a vector of values between zero being the lowest and one being the highest such that these values also add up to one.