In the last chapter, we discussed how a model can be improved to detect other features in an image using convolutions. You can check my GitHub repository for updates. Today we’ll use a more complex dataset with color images, and use convolutions to spot the features of the image.
The Horses or Humans Dataset
We will use the Horses or Humans dataset containing over thousand 300 x 300-pixel images rendered in different poses. If you go through the dataset it’ll be visible that the subjects in each image have different orientations and poses. There are also backgrounds containing trees and beaches in the images. The classifier will have to take these points into consideration and determine whether a new image contains a horse or a human.
The Keras ImageDataGenerator
The earlier dataset we used the Fashion MNIST dataset had its images labeled. But this dataset contains the two types of images in two subdirectories but are not labeled. We use the ImageDataGenerator tool using Keras in Tensorflow to automatically assign labels to the images.
The Humans and Horses dataset is available as a set of ZIP files, you can download it through this link and unpack them into a local directory for training and validation.
In the below code I have added the file path to the training dataset and it shows how ImageDataGenerator can be used.
training_dir = '/content/horse-or-human/training' from tensorflow.keras.preprocessing.image import ImageDataGenerator train_datagen = ImageDataGenerator(rescale=1/225) train_generator = train_datagen.flow_from_directory( training_dir, target_size=(300, 300), class_mode='binary' )
First, an instance of an ImageDataGenerator is created as train_datagen. The images will flow from that specific directory for the training process which is training_dir. Some hyperparameters are included such as the input image size and the class_mode. Since we are using only two types of images we will be using binary as the class_mode.
The difference between the earlier dataset and the current one is that now we use much larger images which are 300 x 300pixels therefore more layers are needed. These are color images, so each image will have 3 channels instead of one. Lastly, the classifier is a binary classifier which means there’s only a single output neuron that gives 0 for one class and 1 for the other.
Below is the convolution neural network defined to train our training dataset.
import tensorflow as tf model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)), tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Conv2D(32, (3,3), activation='relu'), tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Conv2D(64, (3, 3), activation='relu'), tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Conv2D(64, (3, 3), activation='relu'), tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Conv2D(64, (3, 3), activation='relu'), tf.keras.layers.MaxPooling2D(2,2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.RMSprop(lr=0.001), metrics=['accuracy']) history = model.fit_generator( train_generator, epochs=15 )
You can notice that the input_shape in the first layer is (300, 300, 3), meaning the image is 300 x 300-pixel and a color image. Therefore there are 3 channels representing RGB values.
In the end, there’s only one neuron in the output layer since this is a binary classifier as discussed earlier. The sigmoid function is used to drive one set of values toward 0 and the other toward 1 which is perfect for binary classification.
Since there are only two classes we are using binary_crossentropy as the loss function. Here we are trying out a new optimizer which is the root mean square propagation (RMSprop) which takes the learning rate (lr) parameter.
We use fit_generator and pass it to training_generator which was created earlier to fit the training data to your training labels. Older versions of TensorFlow require you to use model.fit_generator when using generators.
Epoch 13/15 33/33 [==============================] - 92s 3s/step - loss: 0.2355 - accuracy: 0.9766 Epoch 14/15 33/33 [==============================] - 92s 3s/step - loss: 6.6581e-04 - accuracy: 1.0000 Epoch 15/15 33/33 [==============================] - 91s 3s/step - loss: 6.8682e-05 - accuracy: 1.0000
After running 15 epochs it can be seen that the model gave 100% accuracy. But to measure the actual performance of the data we have to use a validation dataset and see how the model performs.
Next, using a validation dataset we need to check the performance of the model. You can wither separate data initially from your training dataset but in our case, we have a separate directory for our validation dataset. Note that Validation data is used to see how the network is doing with previously unseen data while you are training whereas Test data is used after training the model.
After training for 15 epochs you can see that the accuracy is sometimes very less. You might realize at this point that the model is lacking a bit of data. But will see how we can overcome these issues in the future chapters.
Next, let’s see how we can use this model to test will images.
import numpy as np from google.colab import files from keras.preprocessing import image uploaded = files.upload() for fn in uploaded.keys(): #predicting images path = '/content' + fn img = image.load_img(path, target_size=(300, 300)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) image_tensor = np.vstack([x]) classes = model.predict(image_tensor) print(classes) print(classes) if classes  > 0.5: print(fn + "is a human") else: print(fn + "is a horse")
img = image.load_img(path, target_size=(300, 300)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) image_tensor = np.vstack([x])
The above code specifies the path of the image through Colab. We have specified the target_size as 300 x 300 which means the image is resized since the model was trained to recognize images of that size.
The second line converts the image into a 2D array but since we need a 3D array expand_dims methods allow us to add a new dimension to the array. Next with image_tensor we stack the image vertically to ensure it has the same shape as the training data.
classes = model.predict(image_tensor) print(classes) print(classes) if classes  > 0.5: print(fn + "is a human") else: print(fn + "is a horse")
The model is returning an array containing the classifications. Therefore we only need to inspect the value of the first element in that array. If it’s greater than 0.5 it’s a human otherwise it’s a horse.
There might be times when it’s hard to distinguish the features of the images and predict the results accurately since the dataset is small. One way to overcome this is through image augmentation. With image augmentation, you can broaden your training dataset with a variety of transformations such as:
- Shifting horizontally
- Shifting vertically
Similar to how we rescaled the images with ImageDataGenerator, we can add additional parameters when loading the dataset.
train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest' )
The model might take a bit of time to train with these new parameters and you’ll also notice that the accuracy of the new model I slightly decreased but the validation accuracy is increased. This is because the earlier model was overfitting. Now if you do a prediction, the results might be better than what we got from the earlier model.
In the next chapter let’s learn how artificial intelligence can be used to understand human-based language. Happy coding! 😃🔥