Skip to content
Home » Code with Me » Machine Learning » Day 2 – Introduction to Computer Vision

Day 2 – Introduction to Computer Vision

⬅️ Day 1 – Introduction to TensorFlow

On the first day, we discussed what Machine Learning is and how TensorFlow is used for Machine Learning. You can check my GitHub repository for updates. Today we’ll see how similar concepts can be used to identify patterns in images using computer vision.

What is Computer Vision?

There are tons of images being clicked through cameras or smartphones and hundreds of hours of videos uploaded to YouTube. Today the internet is comprised of billions of text and image data. While it is easy to index and search text, algorithms need to understand what images contain in order to index them. Even humans should see at least once before in order to identify or recognize things that are in images. The goal of computer vision is to understand the content of digital images and videos.

To understand the concepts let’s consider recognizing different clothing items. For some of us, just by looking at the clothing items, we can identify if it’s a shirt, dress, or a trouser. But imagine there’s a person who has never seen clothing. How will he describe the differences between them?

Similarly, machines will also need lots of different samples to gain experience and recognize clothing items. For this scenario, we will be using the very popular dataset called Fashion MNIST.

What is a Dataset?

A dataset is a collection of data. Data can be text, images, video, or even audio which can be fed to the computers to make predictions. You cannot just feed any data assuming that the computer will understand it. Instead, you have to classify and label your data appropriately to get accurate and better results.

Fashion MNIST Dataset

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems by Yann LeCun, Corinna Cortes, and Christopher Burges.

The database is also widely used for training and testing in the field of machine learning. This dataset contains images of 70,000 handwritten digits from 0 to 9 which are 28 x 28 grayscale.

The Fashion MINST is a drop-in replacement for MNIST which contains images of 10 different types of clothing instead of digits. Each image is a rectangular grid of pixels where the grid size is 28 x 28 and each pixel is a value between 0 and 225. let’s design a Neural Network using this dataset.

The labels correspond to the class of clothing the image represents:

ClassT-shirt/ topTrouserpulloverDressCoatSandalShirtSneakerbagAnkle boot

Let’s now look at how the Fashion MNIST dataset can be trained.

import tensorflow as tf 
data = tf.keras.datasets.fashion_mnist

(training_images, training_labels), (test_images, test_labels) = data.load_data()

training_images = training_images / 255.0
test_images = test_images / 255.0

model = tf.keras.models.Sequential([
          tf.keras.layers.Flatten(input_shape=(28, 28)),
          tf.keras.layers.Dense(128, activation = tf.nn.relu),
          tf.keras.layers.Dense(10, activation= tf.nn.softmax) 

model.compile (optimizer='adam',
               metrics=['accuracy']), training_labels, epochs=10)

In the above example, data = tf.keras.datasets.fashion_mnist is an easy way to access built-in datasets of the Keras API. This way you can directly load the data from TensorFlow without worrying about downloading and splitting the data. We use the load_data method to return our training and testing data. It’ll return an array of 60,000 28 x 28-pixel arrays called the training_images and an array of 60,000 values (0-9) as training_labels. Similarly, the test_images array will contain 10,000 28 × 28-pixel arrays, and the test_labels array will contain 10,000 values between 0 and 9.

We use this training_images = training_images / 255.0 python operation to normalize the image. Since the image is grayscale, with values between 0 and 225, we divide it by 255 to ensure that every pixel is represented by a number between 0 and 1. Normalizing data will improve the performance of the model and help with predicting accurate results.

model = tf.keras.models.Sequential([
          tf.keras.layers.Flatten(input_shape=(28, 28)),
          tf.keras.layers.Dense(128, activation = tf.nn.relu),
          tf.keras.layers.Dense(10, activation= tf.nn.softmax) 

The model is designed using Sequential to specify the no.of layers in the model. Here we are using Flatten layer to covert the 28 x 28 images as a series of numeric values. The middle layers are normally called hidden layers as they are not seen by a caller. The Dense layer contains 128 neurons but any appropriate value can be selected during the training process that helps the learning process. We call this hyperparameter tuning which requires some experimentation to pick the right values. Activation functions such as relu and softmax are used to improve the training process.

model.compile (optimizer='adam',

We then compile our model with a loss function and an optimizer. Using a categorical loss function is a good choice since we are trying to predict which category from 0 to 9 the clothing item belongs. Here we are using adam as the optimizer since we are handling 60,000 image data for improved performance.

You can also see that there’s a new line of code specifying the metrics. In the earlier chapter, we interpreted the learning of the model by looking at how its loss function was reduced. Here we will be using accuracy which will return how correctly the input pixels matched with the output pixels.

We then use function to start fitting the training_images to the training_labels.

Epoch 8/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2579 - accuracy: 0.9042
Epoch 9/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2477 - accuracy: 0.9081
Epoch 10/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2392 - accuracy: 0.9099

You can see that the accuracy is 90.99% after training it for 10 epochs.

model.evaluate(test_images, test_labels)

Next we can evaluate the model using the set of images and labels for testing. We feed them to the trained model and see what each image is and compare that to the actual label and predict the accuracy.

313/313 [==============================] - 1s 2ms/step - loss: 0.3434 - accuracy: 0.8796

Here you can see that the accuracy when predicting the test data set is 87.96%. You can notice that this value is lower than the accuracy we got with our training datset. But when trained with more data the neural network can learn and improve this.

classifications = model.predict(test_images)

Next, we can do more predictions without test images. Using model.predict() we are printing out the first classification and comparing it to its test label.

[4.9832340e-07 1.2591865e-07 3.3560777e-08 3.4377343e-09 5.0859381e-08
 7.8804273e-04 3.8435410e-06 6.7682886e-03 1.7040855e-05 9.9242204e-01]

The classification will give an array of values from the 10 output neurons. The array gives the probabilities that the image matches the label at that particular index. For example, the neural network is reporting that there’s a 99.24% probability that the clothing index at label 0 is 9 which is an Ankle boot.

You can try out training the model for a different number of epochs and see how the model works. While there are instances where the training accuracy is very high this might be a problem when making predictions with new data which is called overfitting. In the future chapters let’s discuss techniques to overcome it.

Let’s see how computers can learn better with the process called Convolutions using other features of an image rather than just using pixels. See you in the next chapter. Happy coding! 😃🔥

Day 3 – Convolutional Neural Network ➡️