Sunday, September 10, 2017

Variational Fashion Encoder

In this experiment I want to try out Variational Auto Encoders, an unsupervised feature learning algorithm on a new fashion classification dataset from Zalando. Since I wanted to play around with both more, I will build a generic Variational Auto Encoder and then learn it on the new fashion-mnist dataset.

Code can be found on github

Variational Auto Encoder

The bottom part of the model is embedding the input X into a mean and variance vector.

The mean and variance represent the parameters of a gaussian that is trained to be close to a standard normal distribution N(0, I).The decoder network is trying to reconstruct the input from a sample from said distribution. The encoder as well as the decoder network can have multiple layers in my case 3 each.

The loss function is a combination of the Kullback-Leibler divergence between the distribution from the encoder network and a standard normal distribution, basically centering the latent space around 0 with standard deviation 1. This can be seen as a form of regularisation. Furthermore, the decoder
does not reconstruct directly from the encoder output but from a sample of the distribution in the latent space.

Fashion Reconstruction

The images in the fashion dataset are 28 pixels by 28 pixels, in the same way as the MNIST dataset.

The fashion data consists of 10 - classes of cloths:

tshirt
trouser
pullover
dress
coat
sandal
shirt
sneaker
bag
ankle boo

After training for 5 epochs on the training set, I plotted the reconstructions from the Variational Auto Encoder.

As one can see the reconstructions work quite well indicating the latent space learned is useful.