Encode image using Autoencoders

1. Overview

An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”. Along with the reduction side, a reconstructing side is learned, where the autoencoder tries to generate from the reduced encoding a representation as close as possible to its original input, hence its name.

2. Create a simple autoencoder

2.1. Dataset

In this simple example, I will use Fashion-MNIST dataset.

from tensorflow.keras.datasets import fashion_mnist

2.2. Define model

import tensorflow as tf

INPUT_DIM = 784
ENCODE_DIM = 128

inputs = tf.keras.layers.Input(shape=(INPUT_DIM,))
encoder = tf.keras.layers.Dense(units=512)(inputs)
encoder = tf.keras.layers.ReLU()(encoder)
encoder = tf.keras.layers.Dense(units=256)(encoder)
encoder = tf.keras.layers.ReLU()(encoder)
encoder = tf.keras.layers.Dense(ENCODE_DIM)(encoder)

encoding = tf.keras.layers.ReLU()(encoder)

decoder = tf.keras.layers.Dense(units=256)(encoding)
decoder = tf.keras.layers.ReLU()(decoder)
decoder = tf.keras.layers.Dense(units=512)(decoder)
decoder = tf.keras.layers.ReLU()(decoder)
decoder = tf.keras.layers.Dense(units=INPUT_DIM)(decoder)
outputs = tf.keras.layers.Activation('sigmoid')(decoder)

model = tf.keras.models.Model(inputs=inputs, outputs=outputs)

2.3. Training

# We use only the image data
(X_train, _), (X_test, _) = fashion_mnist.load_data()

# Normalize
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

# Reshape to (?, 768)
X_train = X_train.reshape((X_train.shape[0], -1))
X_test = X_test.reshape((X_test.shape[0], -1))

# Compile the model
model.compile(optimizer='adam', loss='mse')

EPOCHS = 300
BATCH_SIZE = 1024

model.fit(
X_train, X_train,
epochs=EPOCHS,
batch_size=BATCH_SIZE,
shuffle=True,
validation_data=(X_test, X_test))

2.4. Evaluation

predictions = model.predict(X_test)

import matplotlib.pyplot as plt

plt.imshow(predictions[0].reshape((28, 28)), cmap="gray")

Few experiment results

ID Raw Image Generated
1
2
3
4
5
6
7
8

2.5. How it works

  • The encoder take raw input of size 768 and compress the input to the size of 128.
  • The decoder take compressed input of size 128 and decode it to the size of 768.
  • The training process is to minimize to error distance between raw inputs and generated outputs.

3. Create a CNN autoencoder

3.1. Dataset

I use CINIC-10 in this example.

import tensorflow as tf
import glob

train_paths = glob.glob('inputs/train/*/*.png')
valid_paths = glob.glob('inputs/valid/*/*.png')

def load_img(img_path):
image = tf.io.read_file(img_path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, np.float32)
image = image / 255.0

return image, image

# Prepare dataset
train_dataset = (tf.data.Dataset
.from_tensor_slices(train_paths)
.shuffle(1024).map(load_img)
.batch(128)
.prefetch(1024))

valid_dataset = (tf.data.Dataset
.from_tensor_slices(valid_paths)
.map(load_img)
.batch(128)
.prefetch(1024))

3.2. Define model

ENCODING_DIM = 256

input_layer = tf.keras.layers.Input(shape=(128, 128, 3))
encoder = tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=2, padding='same')(input_layer)
encoder = tf.keras.layers.LeakyReLU(alpha=0.2)(encoder)
encoder = tf.keras.layers.BatchNormalization()(encoder)

encoder = tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), strides=2, padding='same')(encoder)
encoder = tf.keras.layers.LeakyReLU(alpha=0.2)(encoder)
encoder = tf.keras.layers.BatchNormalization()(encoder)

encoder_output_shape = encoder.shape
encoder = tf.keras.layers.Flatten()(encoder)
encoder_output = tf.keras.layers.Dense(ENCODING_DIM)(encoder)
encoder_model = tf.keras.models.Model(inputs=input_layer, outputs=encoder_output)

decoder_input = tf.keras.layers.Input(shape=(ENCODING_DIM,))
target_shape = tuple(encoder_output_shape[1:])
decoder = tf.keras.layers.Dense(np.prod(target_shape))(decoder_input)
decoder = tf.keras.layers.Reshape(target_shape)(decoder)

decoder = tf.keras.layers.Conv2DTranspose(filters=64, kernel_size=(3, 3), strides=2, padding='same')(decoder)
decoder = tf.keras.layers.LeakyReLU(alpha=0.2)(decoder)
decoder = tf.keras.layers.BatchNormalization()(decoder)

decoder = tf.keras.layers.Conv2DTranspose(filters=32, kernel_size=(3, 3), strides=2, padding='same')(decoder)
decoder = tf.keras.layers.LeakyReLU(alpha=0.2)(decoder)
decoder = tf.keras.layers.BatchNormalization()(decoder)

decoder = tf.keras.layers.Conv2DTranspose(filters=3, kernel_size=(3, 3), padding='same')(decoder)
outputs = tf.keras.layers.Activation('sigmoid')(decoder)

decoder_model = tf.keras.models.Model(inputs=decoder_input, outputs=outputs)

encoder_model_outputs = encoder_model(input_layer)
decoder_model_outputs = decoder_model(encoder_model_outputs)
autoencoder_model = tf.keras.models.Model(inputs=input_layer, outputs=decoder_model_outputs)

autoencoder_model.compile(optimizer='adam', loss='mse')

3.3. Training

model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath='./model.{epoch:02d}-{val_loss:.9f}.hdf5',
save_weights_only=False,
save_best_only=True,
monitor='val_loss')

autoencoder_model.fit(
train_dataset,
validation_data=valid_dataset,
epochs=300,
callbacks=[model_checkpoint_callback])

3.4. Evaluation

To save your training time, use can use my pretrain model. Download

autoencoder_model = tf.keras.models.load_model('model.cinic.hdf5')
predictions = autoencoder_model.predict(valid_dataset)

Few experiment results

ID Raw Image Generated
1
2
3
4
5
6
7
8

3.5. How it works

  • The encoder use Conv2D to encode the raw image of size 32x32x3 (RGB channel) into a encoding vector of size 256.

  • The decoder use Conv2DTranspose to decode the encoding vector into a output image of size 32x32x3. Learn more about Conv2DTranspose here.

4. Create an inverse image search index

4.1. Index the CINIC-10 dataset

def euclidean_dist(x, y):
return np.linalg.norm(x - y)


index_dataset = (tf.data.Dataset
.from_tensor_slices(train_paths)
.map(load_img)
.batch(1)
.prefetch(1024))

features = encoder_model.predict(index_dataset)

search_index = {
'features': features,
'dataset': index_dataset
}


def search(query_vector, search_index, max_results=8):
vectors = search_index['features']
results = []

for i, (image, _) in enumerate(search_index['dataset']):
distance = euclidean_dist(query_vector, vectors[i])
results.append((distance, image.numpy()))

results = sorted(results, key=lambda p: p[0])[:max_results]

return results

4.2. Experiment

ID Query Image Results
1
2