I ran a couple of photos through Google’s Deep Dream software. It attempts to recognize patterns using a buttload of heuristics about characteristics that would indicate various things; buildings, animals, landscapes,etc.

It can be sort of gently psychedelic and beautiful or if you turn the levels up a bit frankly horrifying. In its nicer form It reminds me of things I’ve seen when responsibly ingesting socially acceptable pharmaceuticals…or something. This feels more like what I imagine schizophrenia might be like. Here’s the worst before and after ever.

In an interesting way this is a computer emulating the very human trait of constantly seeing things in the world around them that are mental projections. Like “Doesn’t that cloud look like a bunny attacking Abe Lincoln?” This trait is keyed into our ability to recognize anything but also to our ability to recognize types of things. Like recognizing a letter of the alphabet in a strange distorted font. Or like recognizing a building as a bank or a school without seeing a sign. It’s probably also related to the neurological blinders we develop with which we see exactly what we expect to see and don’t see anything we don’t expect to see.

The role this plays in scientific discovery seems clear cut to me. The role this plays in various kinds of bigotry and blind prejudice seems clear too. 

On a happier note, here is a video of a much more beautiful and dreamy example of this sort of software at work.  What it’s doing here is interpreting visual static.

Noisedive from Johan Nordberg on Vimeo.

This is what emerges when you recursively feed noise to a artificial neural network trained to recognise places (e.g. you give it an image and the response is something like 90% beach , 20% desert, 2% swimming pool).

The familiar shapes you see floating past is abstractions the network has made for the different categories. Some categories feature more prominently because the amount of images used to train them differ. For example the “fountain” category contains 111,496 images compared to only 883 in the “nuclear_power_plant” one. So that’s why you see a lot more fountain like shapes than big chimneys.

There’s a lot more to it than that of course, and this visualisation only shows you one of the many abstraction layers the network uses.

Here’s some of my favourite places from this dive:


Rendered on Amazon EC2 using their g2.2xlarge GPU instance, total render time was 38 hours.

Based on the work by Google researches Alexander Mordvintsev, Christopher Olah and Mike Tyka


The places image database including a selection of pre-trained neural networks can be found on http://places.csail.mit.edu