Luma and Chroma
Back when dinosaurs roamed the earth, a team at RCA was wrestling with how to compress color television signal to only 6Mhz. Fortunately, Alda and team had realized that the human eye is a silly apparatus and can easily be tricked. Specifically, the eye is much more sensitive to changes in brightness than color. In this way you can reduce color detail so long as you keep the brightness detail. Even better, the faint glow of the TV tricks your eyes to think it is daylight so the gender difference of the density of rods and cones virtually becomes negligible.
This is why green and blue can look the same sometimes – even to non colorblind eyes. Pro tip: never play strategy board games in low light to avoid accidentally confusing colors.
When the RCA team solved the NTSC over-the-air TV problem, they focused on splitting the Luma (brightness) and chroma (color) signal. This way an old black and white tv could work with just the brightness part and a color TV would overlay the brightness on top of the color. Perfect backward compatibility.
- Luma describes the lightness or darkness of a color. With just Luma, you would have a black and white image. In other words, while Instagram has convinced you that b&w photos are cool, the reality is really it allows them to save bytes per image.
- Chroma is only the color detail. Adding Luma to Chroma produces the color you expect. If you ever use the color picker in office or photoshop, you probably remember that you can select a color by RGB value or sometimes using Hue-Saturation-Brightness values. Functionally they are they result to the same thing and can be converted one to the other. Hue uses a color wheel and then you apply lightness and darkness (saturation and brightness) to the color.
There are many nuances when it comes to expressing Chroma-Subsampling. There are several different notations and variations each with different histories. The most common notation is J:a:b and usual values are 4:4:4 or 4:2:0.
The JPEG notation (exposed by ImageMagick for example) is a little difficult to understand. It uses a H1xV1,H2xV2,H3xV3 notation. For quick reference:
4:4:4 === 1x1,1x1,1x1
4:2:0 === 2x2,1x1,1x1
Fundamentally, this notation expresses how to sample just the color while leaving the luma intact. You can read it like this: Given 4 pixels wide (J), how many unique unique color-pixels in row 1, and row 2 should be used. In this way:
4:4:4 use 4 unique pixels on the first row, and 4 unique on the second.
4:2:0 means use 2 unique pixels (every other entry) and use the same value on the second row as below
This can reduce the amount of data in an image. Put another way:
The result is that luma and chroma details become greatly simplified in the JPEG byte stream. For a 4:2:0 subsampling, it will functionally look like this:
If Chroma-Subsampling is such an old science, I wanted to understand how well it is used in the wild. Some photo editing programs, such as imagemagick, default to use 4:2:0 based on quality settings or other heuristics. Yet, there are many different variations in subsampling rates and each with its own subtle nuances.
From this image set, I ran an analysis on each image to determine the characteristics of each image. The simplest way to do this is to use imagemagick’s identify tool. (you can also use exiftool, but identify allows me to format the output for later user)
identify -format "%[jpeg:sampling-factor]" images/0c/90/f85bb0bcbc2e012faffde9ae4481.jpg
This will yield a result like: 1x1,1x1,1x1 or 2x2,1x1,1x1. ImageMagick uses JPEG’s funny notation for subsampling with a number of redundant variations. A deeper explanation can be found on page 9 of Doug Kerr’s Chroma Subsampling in Digital Images. It also includes a good reference table to convert the H1xV1,H2xV2,H3xV3 notation to J:a:b.
Running this against the JPEG dataset has the following results: