I’ve been continuing my adventures in image manipulation with Matlab, taking the opportunity to play with a technique I’ve been interested in for a long time – ‘slit scan’ or ‘strip’ photography. A very brief (but rather maths-y) explanation of what’s going on in the clip above would be the following:
Let a video be defined by T frames each of dimension X-by-Y. Then the pixel value to be displayed at location x,y at time t is simply V(x,y,t) for some 3-dimensional array V; so the kth frame corresponds to the 2-d image Fk given by the plane T=k. But we may consider other planes to generate frames; by fixing a horizontal position X=k individual frames are images given by (t,y)=V(k,y,t) and iterating through these gives a new video V’(x,y,t)=V(t,y,x).
To more easily understand what’s going on with the video, it’s helpful to take a step back and think about images. My source data for this project was 80 seconds of footage from a surprisingly sunny day in Edinburgh. Captured in 1080p, the video is 1920 pixels wide and 1080 pixels high. If I were to pause it at some instant t, I would see something entirely familiar, a 1920by1080 still image capturing the whole location at that instant, like this:
(click through for full size)
The original video was recorded at 24 frames per second, so I can decompose the entire thing into a collection of 80*24=1920 distinct images, one for each time step.
But now lets suppose that I went through each of those images in turn, but kept only the middle column of pixels: this would be the same as if my camera was viewing through a ridiculously narrow slit. But I have 1920 such images – one for each frame – and they are each 1080 pixels tall, albeit only one pixel wide. If I line them up next to each other I can therefore build a 1920by1080 image, like so:
(click through for full size)
This has familiar fragments -particularly the reasonably normal looking pedestrians in the background – , but is somehow a much stranger slice of the footage! In the original scene, the left of the image corresponds to ‘further west’ whilst the right is ‘further east’. But now we’ve swapped this spatial dimension for time, so the left is ‘earlier’ and the right is ‘later’: we’re at a fixed position in the east/west sense throughout. There are a couple of particularly significant effects of this:
- Everything faces ‘left’. Consider a bus travelling from east to west – right to left in the original video. Then it starts to pass through our capture slit at some time, and the first thing that will appear is the front windscreen, followed by the doorway, front wheel, various windows and so on to the back. But the same is true of a bus travelling from west to east: we still see first the windscreen, then the door, then the front wheel… As we arrange our columns to build the new image, earlier views appear to the left of later ones, so we always place the front of the bus at the left, and the back at the right. Well, not always: if the bus had reversed down the street, or a pedestrian moonwalked through the scene, then we would capture their back first and so present them facing the opposite way.
- Width now corresponds to speed, not size: the longer you spend in the capture slit, the more columns you will appear in and thus the more width you will occupy. At one extreme we have the immobile background – causing the horizontal banding – and at the other speedy pedestrians who only appear for a fraction of a second and thus are recorded only as a narrow sliver. This extends to the dimensions of the image itself- the longer the original footage, the wider the images. Particularly notable in this shot is the bus towards the middle, which stopped for a while before driving away again!
With the individual frames understood, we can now move on to the video. As we proceed forward in time with playback, we are moving east in spatial terms! A normal movie creates the illusion of movement (hence the name!) in space by presenting times in rapid succession, so with our time/space flip we create the illusion of movement in time through a succession of places. We can use this to infer some of the original spatial information, considering the points above:
- Whilst each feature – be it a bus, pedestrian or cyclist – still faces left, their image will move left or right during playback depending on their spatial direction. Consider again someone moving west to east (conventionally left to right). This means that we see them at the west earlier than the east, so for smaller – further west – x values they’ll appear sooner, which is to the left of the frame. But as we advance eastwards by increasing x, they won’t be captured until later. The (surprising to me!) effect is that they therefore moonwalk across the frame! However, the reverse applies when travelling west: at early locations it takes a while for them to arrive so they’re to the right of the frame, but for later frames they appear sooner and thus further left. This is less disturbing, as they appear to be walking ‘forward’. Although I’ve not spotted this anywhere, had anything failed to travel across the entire scene, it would either appear out of thin air, or simply disappear, depending on the direction of travel.
- As width corresponded to speed, so change in width must correspond to acceleration. An object slowing down will have a squished front and extended back; speeding up will squish the back. Both effects apply to the bus which stopped: some of it speeds past then out of view, for many locations it is then visible for a long time before pulling away, which compresses the back. It’s also possible to infer objects moving closer to the camera (at right angles to direction of travel, due to changing lane) if their height increases.
So hopefully this gives some idea of what’s going on, and you may even be able to piece together the original sequence of events! To check if you’re right, here’s the original, conventionally-projected footage: