nst-example

Introduction

This is one of those classic but awesome deep-learning projects that most DL practionners implement at some point or another because of its relative simplicity and the fun visual outputs obtained as results.

I had a few purposes when I started working on this project:

I wanted to get a better handle of Keras and Tensorflow
I wanted to give a try to Streamlit: which is a Python library to create quick interactive apps
I wanted to create fun pictures to decorate my house :)

The paper describing the Neural Style Transfer technique was written by Gatys et al. in 2015.

Content

What I found fascinating the first time I read this paper was applying back-propagation not to udpate the weights of the neural architecture, but to update the input of the data.

Of course, a crucial trick is that the network has to be pretrained on a lot of images, and the author of the paper also had great inspiration on what type of loss function to use.

It is quite clever:

ℒ_{t o t a l} (\vec{p}, \vec{a}, \vec{x}) = α ℒ_{c o n t e n t} (\vec{p}, \vec{x}) + β ℒ_{s t y l e} (\vec{a}, \vec{x})

where $\vec{p}$ is the photograph, $\vec{a}$ is the artwork, and $\vec{x}$ is the input.

Both loss terms are square root type losses. The content part minimizes the encoded feature representation for both the photograph and the input data at a certain depth of the network:

ℒ_{c o n t e n t} (\vec{p}, \vec{a}, l) = \frac{1}{2} \sum_{i, j} (F_{i j}^{l} - P_{i j}^{l})^{2}

For my implementation I used block4_conv2.

The style part minimizes the square error between the style matrix of the artwork and the input data, for a layer l:

E_{l} = \frac{1}{4 N_{l}^{2} M_{l}^{2}} \sum_{i, j} (G_{i j}^{l} - A_{i j}^{l})^{2}

Then

ℒ_{s t y l e} (\vec{a}, \vec{x}) = \sum_{l} w_{l} E_{l}

Where the style matrix is simply the dot product of the encoded feature representation matrix with its transpose, at a given layer l:

G_{i j}^{l} = \sum_{k} F_{i k}^{l} F_{k j}^{l}

G^{l} = A^{l} . (A^{l})^{t}

So in the end there are a lot of hyperparameters and weights that can be tuned manually to try to lead to better results.

Which is why I thought it would be nice to have a UI to do it, hence Streamlit!

Challenges

The hardest part for me by far was to get a good working installation of tensorflow to work with my GPU. At that time Tensorflow 2 was still pretty new, so maybe it's better with that new major version now, but somehow it was a bit of a struggle for me with v1.8 (as opposed to Pytorch which was a breeze to install for my other projects). In order to solve this and to make my setup easily repeatable I ended up using the Keras Dockerfile examples from their repo, but that was a complexity I did not expect before starting.

I remember also struggling a tiny bit to re-arrange the network to add the variable input at the entrance and replacing the Max Pool layers with Avg Pool layers as suggested by other projects.

Results and final notes

The workflow in the Streamlit app looked like this in the end:

nst-streamlit

And I also used imgflip.com to create the gif shown in this post.

I hope you'll find this project useful if you decide to try Neural Style Transfer yourself. You can find the repo here.