The objective of image compression is to reduce the unnecessary information in the image to reduce the required bandwidth or storage space. As with audio signals, there are both lossless and lossy algorithms available depending on the application an image is meant for.
Note that when analysing image, all of the transformations are considered 2-dimensional by default.
In order to understand the way image is stored in a computer, various colour spaces have been defined. Let’s mention the most used ones.
Abbreviation |
Meaning |
Explanation |
RGB |
Red, Green, Blue |
Each pixel is given by the combination of the 3 colours of light. Combination of the highest levels of all 3 colours gives white. It is used in light imaging. |
RGBA |
Red, Green, Blue, Alpha |
Applies the same explanation as RGB. The additional Alpha channel describes transparency. |
YCBCR or YUV |
Brightness, Blue chrominance (U), Red chrominance (V) |
Brightness scales from black to white. Blue and Red chrominance are calculated from the given RGB source. Although differences in calculation exist between them, YCBCR is also referred to as YUV. |
CMYK |
Cyan, Magenta, Yellow, Black |
Each point is given by the combination of the 4 colours. Combination of the highest levels of all 4 colours gives black. It is used in print. |
One of the most commonly used lossy image formats for acquiring photographs is JPEG. It is named after the Joint Picture Experts Group which created the standard in 1986. It achieves the compression of 10:1 with little perceptible loss in quality.
The JPEG algorithm is based on the 2-dimensional Discreet Cosine Transform (DCT). The input image is converted into the YCBCR colour space which provides better properties than say RGB colour space. The image is then split into non-overlapping blocks of 8x8 which are transformed using DCT. The obtained coefficients are quantized and insignificant coefficients are removed which is where the lossy compression occurs. The coefficients are then collated into 1-dimensional sequence and lossless coded. Image components (Y, CB, and CR) are encoded in turn.
The quantization is the key to compression in JPEG algorithm. The quantization is non-linear as human eye is more sensible to changes in lower frequencies. In order to bring scalability to the quality/compression ratio a quality factor qf is defined, ranging from 1 to 100, which modifies the quantization matrix.
To collate the coefficients, a “zig-zag” reading is used, starting in the upper left corner. If each 8x8 block is encoded together the encoding is called baseline JPEG. Other approach is to encode all of the upper left corners of each 8x8 block in a sequence and continue to the next position in each of the blocks. Such approach is called the progressive JPEG and has the advantage of gradual image reconstruction during its download. Additionally, JPEG offers hierarchical mode where the image is encoded in a layered pyramidal way. Each upper layer’s pixel is acquired by applying certain operation to the 2x2 pixels laying in the layer right below. Each layer is individually separable by the decoder, allowing for multi-resolution images.
JPEG also supports lossless coding based on prediction coding and lossless VLC coding, omitting the DCT transform and spectral operations. The typical compression ratio is approximately 2:1.
The JPEG 2000 format attempts to be the successor of the imperfect but still much more preferred JPEG format. It extends the possibilities of its predecessor, improves quality/compression ratio and allows for scalable lossy to lossless compression. Other improvements include region of interest coding where important parts of an image are coded more precisely than the rest.
The transform function changed from DCT to 1D Discreet Wavelet Transform (DWT). The original image is wavelet transformed, quantized and entropy coded. The main difference between DCT and wavelet transform is that DWT divides the blocks of image into subblocks which are then divided into subblocks, etc.
Wavelet is a wave-like part of a function which, unlike say sine function which continues from infinity to infinity, has its start, amplitude and end. It can be of various shapes that we choose depending on the signal we analyse.
Wavelet transform basically takes the wavelet and compares its similarity to a part of the analysed signal. As the wavelet has beginning and end, it can be stretched to any scale. If the scale is chosen in defined steps multi-resolution wavelet spectrum is obtained.
The DWT creates 2 sets of samples, the low-pass and high-pass samples. To successfully reconstruct the signal only high-pass samples from each level of resolution are needed. They represent the details to be added to the lower level of resolution to construct the higher level.
The Graphics Interchange Format (GIF) is still a popular image file format on the Internet. Introduced in 1987, it is a bitmap format with support for 8-bit colour palette, transparency and provides good compression ratio. Due to the limited colour palette (255 colours) it has limited use for high fidelity images, such as photos. It is, however, suitable for limited colour images, such as logos, with sharp edges and minimal colour transitions. The second version (1989) of the file format brings support for transparency.
GIF uses the Lempel-Ziv-Welch (LZW) algorithm to compress the image data, assigning byte sequences in a dictionary to colours in the colour palette.
Even though new, more advanced algorithms exist, such as PNG, GIF still has popularity thanks to its support for animation by placing multiple images on top of another. This feature has been exploited to allow for true colour (24 bit) images and animations by placing three 8-bit frames on top of each other, each containing a part of the 24 bit colour palette.
Portable Network Graphics is a bitmap image format meant to replace GIF with its licensing issues and technical limitations. It was proposed in 1996 and became international ISO/IEC standard in 2004. PNG supports RGB and RGBA colour spaces with 8 bits per colour (24 bit RGB or 32 bit RGBA).
The PNG format is very flexible with its container-like structure. The image is created as a series of “chunks” which allows for distribution of the image information, layer support and data streaming.
The lossless compression works in two stages:
During pre-compression, image data is reduced using a method similar to DPCM when the pixel value is stored as a difference between it and the pixel to the left, above, above left or a combination thereof. For each line of pixels different filter type may be used. Then, the values compressed using the DEFLATE algorithm which eliminates duplicate strings using references and applies Huffman coding scheme to blocks of data rather than to the whole image.
The first name proposal for the format was PING meaning an anagram – “PING is not GIF”.
When compared to JPEG, the PNG algorithm produces larger files of photo-like images with subtle colour transitions.
However, JPEG has hard time processing sharp transitions and edges, such as text, line art or graphics, on large area of solid colour, and produces artefacts. PNG is capable of better compression and no artefacts are visible after compression which makes it ideal for web use.
The youngest of the formats, WebP came from the laboratories of Google in 2010. It is presented as a new open standard to compete the still very popular JPEG. WebP brings the best of the feature of JPEG (good performance with true-colour graphics), JPEG 2000 (both lossy and lossless image compression), PNG (alpha transparency in both lossy and lossless modes) and GIF (animation support).
The lossy algorithm is based on the VP8 video format. The compression is based on the prediction of image blocks from three blocks above and one block to the left, employing one of four modes: horizontal, vertical, DC (one colour) and TrueMotion. Wrongly predicted and non-predicted blocks are then compressed in a 4x4 pixel block with DCT or Walsh-Hadamart transform. The output is then entropy coded.
Apart from standard techniques like dictionary coding and Huffman coding, the lossless algorithm uses advanced techniques like different entropy codes for different colour channels or colour cache of the recently used colours.
When compared to other image formats, JPEG and PNG, WebP seems to outperform them by at least 20% in their focus image types. WebP is currently supported on Linux and Windows via plugins, and Firefox, Chrome and Opera support WebP as well.