Terence Chan Photography

How to Create Stop Motion Animation/Slideshow in Linux without Special Software


Introduction

In this article, I will describe how to create stop motion animation videos or photo slideshows in Linux, using only standard Linux commands and video encoding software which comes with almost all Linux distributions, without the need for any specialized software. Using the method described here takes a similar amount of time and effort - perhaps even less, depending on the nature of the animation - as using the GUI of most animation/slideshow softwares. But given that there are many slideshow and stop motion softwares already written for Linux, you may be wondering "why bother"? So firstly, let me give some motivation for this little project. If you are impatient and just want to cut to the chase, click here for the details.

Here is the video which was made using the method described below.

If you have already used one of the stop motion or slideshow softwares available for Linux successfully without too many problems and it all works well for you, you need read no further. However, if you do not find most of the available software completely ideal, or if you are just curious, then read on.

This project started when I wanted to create a short stop motion animation video or photo slideshow using some photos I had taken. There was not a great amount of motion in my photos and I did not want to create a very smooth series of animated motions, involving a very large number of individual frames. However, unlike a standard slideshow, I did want the individual images to be displayed for varying lengths of time to create a sense of movement at varying speeds in the animation. I also wanted to encode the resulting video in various formats for different purposes, ranging from displaying on a home computer or web page to a HD video for displaying on a large screen.

Since the animation I wanted to create was not very complicated and involved only a relatively small number of images, I started by looking at software for creating slideshows which allow for varying lengths of time each image is displayed. The first promising software I came across was Imagination, which is billed as a "lightweight and simple DVD slide show maker for Linux", which sounded just what I needed. Unfortunately, when I downloaded the source code and tried to compile it, I discovered it required an excessively new version of Gtk+2, which was newer than the version available on the latest version of my Linnux distribution, CentOS, so simply doing a 'yum update gtk+' just resulted in a message telling me I already had the most recent version installed. Being the lazy person I am, I gave up at this point rather than trying to download the required version of Gtk+2 and install it manually.

I next came across a stop motion animation software for Linux, Stopmotion. It compiled successfully and I was able to run it and succeeded in making a very short rudimentary test stop motion video. However, it did not work as well as I had hoped. To cut a long story short, the version I built on my Linux distribution had some strange behaviours and lacked certain key functionalities, for reasons which I was again too lazy to investigate further in order to fix the problem. A more general criticism is that it only allows creation of frames by dropping individual images into a timeline (which I believe is typical of most stop motion softwares) - there is no way to automatically generate a large number of frames from the same image. This would be fine if there is a lot of motion and each individual image appears for only a fraction of a second. For my video (and for most slideshows in general), most images appear for over 1 second, and at 25fps, that means at least 25 frames per image.

However, I noticed when trying to change the video encoding settings that Stopmotion was using either mencoder (which is part of the mplayer package) or ffmpeg to do the encoding, and it was simply allowing the user to edit the command line options to either mencoder or ffmpeg. At this point, I thought, if that was the case, I could just issue the mencoder/ffmpeg command directly from the Linux command line. This was how I ended up working out how to do the whole thing from the Linux command line, without using any special software, other than mencoder or ffmpeg, which are included as standard with most Linux distributions (or if not, they can be easily installed by simply doing 'yum install mplayer' or 'yum install ffmpeg', or whatever the equivalent of 'yum' is for your Linux distribution, e.g. 'apt-get' for Ubuntu).

What I find appealing about the method described below, apart from its simplicity and transparency, is that it has a rather clunky DIY Heath-Robinson feel to it, which I think is more in keeping with the spirit of early Linux. Nowadays, Linux is all very slick and can sometimes feel like every other operating system out there. However, I still prefer to do most things from the command line in a terminal window. (If I wanted an operating system that will always spoon-feed me and tie my shoelaces for me, I can always use Mac OS, or even MS Windows!)

The details

In what follows, I will assume that you have already created all the images that are to be used in the animation, and that they are all in a directory which I will call the "image directory" (imgdir). It is not necessary to have all the source images in the same directory - it is perfectly possible to use the scripts below even if your source images are scattered over several different directories. However, if you have a large number of source images (say, over 100), you will find it much more convenient to have all your source images in the same place, and preferably given names in a lexicographical order that at least roughly corresponds to the order in which the images are to appear in the animation. If you are shooting the images with a digital camera, more or less in the order that the images are to appear in the animation, then the camera will already have named the image files in this way.

All the source images have to have exactly the same pixel dimensions - even a 1 pixel difference will cause mencoder to crash. If your images are of different dimensions, you would need to create a common background canvas onto which all your images can fit.

There are two basic steps to creating a stop motion animation/slideshow:

  1. arrange all the individual frames into separate files (1 file per frame) in lexicographical order, corresponding to the order in which the frames are to appear, in the same directory, which I will call the "working directory";

  2. encode the frames from step 1 into a video.

These two steps are independent of each other, so you can follow my method for just one of the steps while using your own method for the other step, if you wish. If you already have your own method for doing Step 1, you can jump straight to Step 2.

Step 1: creating the individual frames

First, decide how long each image in the image directory is to appear on screen. Having decided this, you then need to choose a value for the number of frames per second (fps) which will support the shortest time that an image is to be displayed. For example, if the shortest time that an image is to appear is 1/4 second, you need at least 4fps, so that a single frame corresponds to 1/4 sec; an image which is to appear for 2 seconds will then need to occupy 8 consecutive frames. However, note that although the MPEG-4 format can support any fps down to 1fps, certain other formats such as MPEG-2 can only support 25fps - for more details, see Step 2 below.

You then need to copy each image into a number of consecutive frames corresponding to how long it is to appear, and all the individual frames must be separate image files with file names that are in the correct lexicographical order corresponding to the order in which they are to appear. Simply naming each frame 1.jpg, 2.jpg ... etc will not work, because 10.jpg comes before 2.jpg in lexicographical order. (For the purposes of this discussion, lexicographical order means the order in which the ASCII code for each character appears.)

If you were creating a stop motion animation in the "standard" way, where objects in a scene are moved in small incremental amounts and then photographed, and each shot corresponds to a single frame which is to be displayed for the same amount of time, then all the frames would already be given names in lexicographical order by the camera - provided you are shooting with a digital camera. In this case, all you have to do is transfer all the images from your digital camera to the working directory on your computer, and you can jump straight to Step 2. However, if the images you shot are to be displayed for different lengths of time, or if the images were not shot in the same order as you want them to appear in the video (which is probably the case if you are creating a slideshow from a collection of images), then you need to generate the necessary frames in lexicographical order using the method described below.

To generate frames in the correct lexicographical order, you need to pad the frame numbers with an appropriate number of leading 0's: 00001.jpg, 00002.jpg ... etc. For example, at 25fps, if image1.jpg in the image directory is to appear for 1 second, you will need to generate 25 copies of image1.jpg named 00001.jpg, 00002.jpg ... 00025.jpg in the working directory. If the 2nd image, image2.jpg, is to appear for 2 seconds, you will then need to generate 50 frames named 00026.jpg, 00027.jpg ... 00075.jpg, and so on. I have written the following shell script, which I will call 'mkframes', for this purpose:

Rather than making multiple copies of the same image, the script simply creates multiple soft links to the source image file. Not only will this make the 'mkallframes' script (see later) run much faster and save disk space, this also has the advantage that if you were to edit the source image later, you don't need to generate the frames again.

The script is to be invoked (in the working directory) as follows:

For example,

will create soft links to imgdir/image1.jpg named 00001.jpg, 00002.jpg, ... 00025.jpg in the working directory. Next,

will create links to imgdir/image2.jpg named 00026.jpg, 00027.jpg, ... 00075.jpg.

Note that the above script names individual frames using 5 figures with leading 0's where necessary. This will allow for videos with a maximum of 99,999 frames. You can change the number of figures by modifying the 'printf "%05d"' part if you need more frames than this. Also, the script will automatically determine the file type of the source image, so if the source image was image1.png instead, it will generate frames 00001.png, 00002.png etc.

The file names of frames do not have to be numbered consecutively - they just have to be in the correct lexicographical order. So having generated a number of frames, you can delete or insert more frames, while maintaining the correct lexicographical ordering. For example, if you delete a certain number of frames, the remaining frames will still be in the same lexicographical order as before. On the other hand, inserting new frames is a little more complicated, but all that is required is to pad out the file names with 0's on the right. For example, if you want to insert 3 frames between 00001.jpg and 00002.jpg, all you need to do is to call the new frames 000010001.jpg, 000010002.jpg and 000010003.jpg. They will all be after 00001.jpg because '0' comes after '.' in lexicographical order, and they will all be before 00002.jpg because '1' comes before '2' in lexicographical order.

I have written 2 bash scripts for this purpose. The following script, which I will call 'insframes', is for inserting new frames:

You invoke this script as follows:

For example, if you want to insert 5 frames using imgdir/image7.jpg followed by 10 frames using imgdir/image8.jpg between frames 00001.jpg and 00002.jpg, you would do

This will generate 5 links to image7.jpg called 0000100001.jpg ... 0000100005.jpg and 10 links to image8.jpg called 0000100006.jpg ... 0000100015.jpg. (You can also do

because the script will automatically strip out the file extension from 00001.jpg when working out the file names for the new frames.)

To delete existing frames, use the following script, which I will call 'rmframes':

You invoke the script as follows:

For example,

will delete all frames which are lexicographically between 00022.jpg and 00056.jpg inclusive. Note that you need to give the full file name (including any extensions) when using 'rmframes'.

You will notice in the usage of the above scripts that there is a difference in usage between frames that are numbered "naturally", as in

        mkframes 1 25 image1.jpg

or

        insframes 3 6 00012 image27.jpg

and frames which need to have all the padding with 0's, or even the full file name, as in

        insframes 3 6 00012 image27.jpg

or

        rmframes 00022.jpg 00056.jpg

The basic rule to remember is that when referring to new frames that are yet to be created/inserted, you should just use the "natural" numbering, 1,2,3,... etc, whereas when referring to existing frames, you need to use the full file name. This is because the scripts have no way of knowing which frames are already there and how they have been numbered, so you need to tell the script the full name of the frames, whereas for new frames that are yet to be created, you just want to worry about how many frames there are and where they appear in the numerical sequence, and leave the actual naming with padding by 0's to the script.

NB: a note about lexicographical ordering. As mentioned earlier, lexicographical ordering means the order in which the corresponding ASCII codes appear. The way I have earlier described the lexicographical order of inserted frames relative to existing frames complies with this, and that is exactly how mencoder or ffmpeg will treat the file names (in Step 2). However, the 'ls' command lists files in a rather bizarre order, which I still cannot understand. If you do a 'ls' after inserting some frames, 0000100001.jpg will actually be listed before 00001.jpg! Do not be confused by this - in lexicographical order, 0000100001.jpg definitely comes after 00001.jpg.

Finally, once you have decided how many frames you need for each image (based on the fps you have chosen and how long each image is to appear), I suggest you put all the 'mkframes' and 'insframes' commands into another script, say 'mkallframes', along with any helpful comments, e.g.

and then run 'mkallframes' to generate all the frames for the entire video/slideshow. This way, not only will you have a complete record of everything you've done, but you can also edit the script and re-run it later if necessary. Once you have encoded the frames into a video, you can then delete all the frames (after all, they are just duplicate copies of images in the image directory), knowing that you can easily re-create them again by re-running the script 'mkallframes'.

If you have a large number of source images whose lexicographical order is the same as the order in which they appear (which is likely to be the case if you follow my advice at the beginning to have all your source images named in a lexicographical order that roughly corresponds to the order in which they are to appear), and if they are all to appear for the same length of time (and hence you need to generate the same number of frames for all of them), you can use the following script, which I will call 'mkmultiframes', to save yourself a lot of typing:

You invoke this script (in the working directory) as follows:

For example, if image0011.jpg .. image0020.jpg are all to appear for 0.4 seconds (10 frames at 25 fps), starting at frame number 101, then instead of typing the necessary 'mkframes ...' commands 10 times, you could do

This would add the following lines

to the end of the 'mkallframes' file.

Note that the 'mkmultiframes' script can also make frames from source images in reverse order, if 'start_img' is lexicographically after 'end_img'; so for example, the command

would add the following lines

to the end of the 'mkallframes' file.

Similarly, if you have a lot of lexicographically consecutive images all to be inserted for the same amount of time, after the same frame, you can make use of the following script, 'insmultiframes':

For example, if image0011.jpg .. image0020.jpg are all to appear for 0.4 seconds (10 frames at 25 fps), to be inserted after frame 00145 starting from frame number 11, then you could do

This would then generate the following lines

which you can then cut and paste into the appropriate section of the 'mkallframes' file. (Or, if you don't mind the 'mkallframes' file being a bit disorganized, you could even just append these lines to the end of 'mkallframes', since the frames can be generated in any arbitrary order - the end result is still a collection of frames named in the correct lexicographical order.) Like 'mkmultiframes', the 'insmultiframes' can automatically detect the order in which source images are to be processed depending on whether 'start_img' is lexicographically before or after 'end_img'.

Final notes

Once all the frames are in place in the working directory, you can "preview" the video before doing any encoding, by running 'display' (part of ImageMagick, which also comes as standard with most Linux distributions), then go to the "File -> Visual Directory" menu and type in '*.jpg'; as the thumbnail directory is being created, you can watch the individual thumbnails flash up on the screen in quick succession, resulting in an animation effect. The fps and duration of the images won't be correct, but the relative duration of each image, relative to the durations of other images, will be correct. The result is usually a speeded-up version of the video.

All video displays (computer monitors, TV screens etc) are in landscape format and if the source images are in portrait format, the resulting video/slideshow will have large blank borders on the left and right sides. For high quality videos, in order to make the most of the screen area, rotate your source images by 90 degrees, so that they are lying on their side in landscape format, then rotate the display device by 90 degrees when the video is played.

To encode images (at least with mencoder), all your images need to be in the same colour space. So if you have a mixture of black-and-white and colour images, all your B&W images will have to be in RGB format, not greyscale. However, a mixture of different file formats (.jpg, .png, .bnp) is allowed.

Step 2: encoding the video

This is by far the easier of the 2 steps. I will only describe how to use mencoder to do the encoding, just because this is the encoder I am most familiar with. For those who prefer ffmpeg, similar methods and options can also be used, only ffmpeg has different names for some of the options.

I won't bother to go into all the details about all the options, as there is already a good manual for mencoder. Start by reading Chapter 6 of the manual, which contains some basic mencoder commands to get you up and running. For more details about the various codecs and options, read Chapter 7 of the manual. The man pages for mencoder and ffmpeg are also good sources of reference.

There is a lot written already about encoding videos with mencoder/ffmpeg, and a Google search will throw up lots of articles on the subject. However, I should say a few words about video encoding that are specific to stop motion animation, as most of what is written about the quality of MPEG-n vs H264, appropriate bitrates etc are for motion videos. A stop motion animation composed of still images is typically of relatively low video complexity, unless there is a lot of motion which is to be animated in a very smooth way using a large number of individual images. For such animations, the key factors that determine the final video quality are firstly the quality and resolution of the source images, and secondly the quality of the display device (e.g. a HDTV monitor will give noticeably higher quality images than a PAL standard monitor); the particular encryption method used is of secondary importance -- any decent video encoder using any codec will not degrade the quality of the source images in any noticeable way. Therefore, it is far more important to ensure a high quality of source images, and to choose a resolution close to the maximum resolution that the display device is capable of (for HD video, this would be about 1080 pixels in the shorter dimension, corresponding to the 1080p resolution of HDTV monitors). Considerations such as MPEG-2 vs MPEG-4, 2-pass vs single-pass encoding, 8000 vs 16000 bitrate etc will not have a material impact of the final quality of the video.

MPEG-4 encoding

This is suitable for display on a typical computer screen or streaming on the web.

To use mencoder to generate a n-fps MPEG-4 stop animation video, using jpeg files in the current directory, first compute the optimal vbitrate as follows:

        vbitrate=60*⟨fps⟩*width*height/256

Then do:

To use mencoder to generate a 2-pass MPEG-4 video, issue the commands:

With MPEG-4, any value of fps down to 1fps is supported.

MPEG-2 encoding

This is usually more suitable for high quality displays on large screens, and if you want to make a PAL DVD of your video, you would have to use this format. Note that MPEG-2 can only support 25fps.

To use mencoder to generate a 25fps MPEG-2 stop animation video, using jpeg files in the current directory, issue the command:

To generate a high-quality PAL DVD compliant MPEG-2 video, do

The 8000kbps bitrate is typical of good quality SD video and PAL, which has a max bitrate of 9800. For low-complexity stop motion animations, this is more than sufficient.

The '-vf scale=720:576' option above complies with the resolution of PAL 4:3 displays and assumes that the source images are in landscape format; for source images in portrait format, see the remarks at the end of the section Step 1.

If you want to make a PAL DVD of your video, you will need to use a suitable DVD authoring software such as dvdauthor to create the DVD file system.

X264 (HD video)

This is usually the preferred format for HD video, and is suitable for making videos for HDTV display screens. However, you will need to have x264 installed (simply doing 'yum install x264' on most Linus distributions should work). Note that even though you can use x264 with the -ovc option with mencoder, x264 is not a codec as such, but an actual video encoder in its own right, and you can use the command 'x264 ...' directly from the command line if you wish.

Like MPEG-2, X264 can only support 25fps (at least when used with mencoder).

To use mencoder with the x264 encoder for HD video, do

To use 2-pass encoding, do

Use the '-vf scale=⟨width⟩:⟨height⟩' option to make the output dimensions divisible by 16 - this will improve compression and quality. The scaling algorithm used by mencoder is bicubic, so the quality of the scaled video output is still high.

The 16000kbps bitrate is typical of HDTV transmissions, and HD-DVD/Blueray have much higher maximum bitrates. For low-complexity stop motion animations, 16000kbps is more than sufficient.

Flash video for embedding on web pages

To generate a .swf flash video file, do

The same command can be used to generate a .flv video - just change the file extension of the output video file. Use the '-vf scale=' option to make the size of the video fit onto an appropriate browser window, rather than let the browser automatically fit it to its own window, which typically results in far poorer quality pictures.

Final notes

The .avi format is simply the standard mencoder container format; it is the codec (mpeg-2/4, x264) specified with the -ovc and vcodec options that determines the actual video format. You can force a particular output format using the -of option (e.g. '-of mpeg2' or '-of rawvideo'). In the case of x264 encoding, .264 is the usual extension for a raw video stream.

Apart from perhaps MPEG-4 videos made from low pixel dimension images, 2-pass encoding is usually not worth doing, even for X264 HD videos, and almost never worth doing for MPEG-2.

The end result
Here is the video produced using the method described above.