All Posts By

Ammar Khan

Using Deep Learning for Developing Platforms with Small Datasets

By Deep Learning

By Ammar Khan

Deep Learning is one of the most resource hungry gorillas today in terms of compute, memory, and dataset size required to build models. If you are curious enough to automate your platform in terms of labeling data that is coming and leaving on-the-fly, Deep Learning is the modern-day magic to make it happen. But you might not have the suitable amount of data to train these models and maybe not enough compute and memory resources to make it happen.

Deep Learning models are roughly based on the human mind. Over the centuries, people do not need to reinvent the wheel each time from the start. We read history to learn about the solutions that helped solve the problems in the past. We write history for future generations too. Human genes do that part as well. It encapsulates the requirements of today and adjustments are made in the inherited gene structure. We already have protection from most of the diseases that our parents had, and our bodies restructure the inherited part for future generations as well. This is a continuous progression with minor additions into the whole genome structure that was already functional.

Deep Learning adds something like that under the umbrella of ‘Transfer Learning’. There is a whole set of complex algorithms carefully crafted for problems at hand and are trained on terabytes and petabytes of data. We can leverage this as a layer in our already trained model, and with very limited resources, we can make our personal models with a smaller data set using a previously trained, more complex model.

The Assumed Problem:

Let’s say we are running a platform where we are asked to separate dog and cat images users upload and label them on-the-fly. But we only have 100 or 200 MB of data available in our platform. This is not enough to train a Deep Learning model. But we can leverage a Transfer Learning model that has already learned enough for a much larger problem. We are going to import this as a layer and feed our data into the part of the problem we want a solution for.


VGG16 is a Deep Convolution Neural Network was designed to solve complex image recognition problems. It was trained using IMAGENET data set which comprises of 15 million labeled high-resolution data set with 22,000 categories. The architecture for VGG16 is:


We do not have this amount data to train the algorithm, nor do we have the resources to actually train this deep neural network. What we are going to do is to import it as a layer in our already trained model and use it for our binary classification of cats and dogs. We prepare our dataset of 2000 training images and 1000 test and validation images of cats and dogs. Importing the VGG16 as the top layer in our model and using a dense layer and an output layer to filter large network in our problem. This is what the network looks like:


As you can see, we have frozen the VGG16 part from the training, as we do not wish to train this giant network. It will already have pre-learned weights from IMAGENET. The only trainable layers are the Dense and the output layer. Let’s train the model using the following parameters:

In about half an hour of training on a 4 core / 16 gig i5 machine, the model yielded the following training and validation graph.

Let’s check the accuracy of the model on our test set:

The model yielded an accuracy of 89.4% on the test set and 94.4% accuracy on the training and validation set. This is quite amazing. Let’s check this in terms of the actual score of predictions on the test data set of 1000 images of cats and dogs. We are going to use confusion matrix for that:

As you can see the two dark blue shades showing correctly identified cat and dog images. The error is quite low as the model accurately identified 882 images into two categories successfully with little effort. As you can see, these models can be added to a pipeline where incoming data can be sorted on the fly by querying the model.

Detecting Hand-Written Digits with Deep Learning

By Deep Learning

By Ammar Khan


Digitalization of the modern world has blessed the 21st century with enormous trust that we can leave our services on autopilot. But, what about the material in the form of records that we collected from the technologies before digitization? Abandon it? Not only does it contain valuable information needed to be archived digitally, but they contain functional data for gaining insights from trends gathered from the past. Let’s pick an example, in my own experience, a small police station in my locale has four storage rooms filled with hand-written archived records. Since most new means of inserting records have been digitized, what do we do about the old records? Leave it as it is? Dump it? From the standpoint of government, both are not feasible. Transforming old records into digital form seems more reasonable. It will prolong the usability of records, save a lot of physical space, and decrease the search time required to find a particular record. For the sake of convenience, we are going to use a shallow network that can be trained over a laptop.

Problem Statement:

The problem that we are going to solve with Artificial Neural Network is recognizing numeric digits that are hand-written. The data set we chose to solve this is the MNIST Handwritten-Digits data set. The data set contains 60,000 training examples from 250 different writers. It contains 10,000 test examples for the model to evaluate it. An example from the data set is given below.


The framework used to train ANN is Keras. Every pixel of all the images were normalized between the range of 0 and 1. The 2-D image matrices were then transformed to 1-D linear vectors. The classes of 10 digits, i-e from 0-9, were encoded using One Hot Encoding. For modeling the layers, we have opted a short and easily trainable neural network that does not require clusters to complete in a given time. These layers are:

· Dense layer with 784 neurons, activation is relu (input layer).

· Dense layer with 256 neurons, activation is relu.

· Dense layer with 64 neurons, activation is sigmoid.

· Dense layer with 10 neurons, activation is softmax (output layer).

Training Hyper Parameters:

optimizer: RMSprop

loss function: binary_crossentropy

learning rate= 2e-5

epochs: 30

batch size: 20

validation set : 20%


The model was neither over fitting, nor under fitting, as shown in the figure below:

The accuracy of the model for test set was 97.11%, which is excellent for a shallow neural network.



The confusion-matrix for all test set is shown below:



This example shows the capability of Deep Learning, in the field of image recognition, for tasks that are tedious in-terms of labor force. Though we might not want to manually process archives, we can train a model and let it do the work. If we want to make it more streamlined and develop a workflow, we can add pipelines with a Deep Learning model attached on each decision path. Let’s say, one’s job is to draw a bounding box on things you want to process. Then, these bounded boxes will be sent to other Deep Learning models that will process the object detected in the bounding box and the final one will process on a single category only. Why make it tedious like that, one may ask? Training a larger network that can do all of these jobs may require an extremely painful process and designing the network will become more confusing because one layer might drop inputs required for another. Each model can be attached and optimized for its own duty. That is more clean, trainable, workable, and manageable.