High-DPI, Subpixel Text Positioning, Hinting
What happens when an unstoppable bullet hits an impenetrable wall?[1]
Behdad Esfahbod
behdad@google.com
August 30, 2012
Metrics Hinting Meets Linear Layout
Conclusions and Recommendations
When getting text onto the screen one has many choices, and decisions to make:
What you decide to do largely depends on the type of application, layout requirements, display device, and how much control you have on those. It’s best to learn to think of these different knobs as, well, different knobs. You can turn each on / off or otherwise adjust, independently of the others. Some combinations make more sense than others, but in general, you still can change them independently.
Note that when most people talk about subpixel text positioning, what they really mean is subpixel text positioning and no metrics hinting. Technically, nothing stops you from having subpixel text positioning and still hint metrics. Though that’s not what most people mean or do.
It also helps learning to think of text rendering as two separate processes: layout and rendering. Layout is the process of deciding where to show each glyph. Rendering is actually showing glyphs at those positions given all the constraints of the display medium.
When it comes to layout, there are two opposite directions you can go: linear, and non-linear. Glyph positions produced by a linear layout function can be transformed by an affine (or even projective, if you are careful) transformation, and they would result in exactly what would have had resulted if the font scale matrix was transformed by such transformation before layout. Ie. linearly laying out a paragraph at 12pt to a width of 4in will result in the exact same look and line breaks that results from setting it at 24pt to a width of 8in. That’s a very nice property, because it means that you can zoom, rotate, translate, shear, even project the layout results freely.
Non-linear layout would be different. For example, you may decide (for many legitimate reasons), that at 10px size, the glyph for letter ‘i’ should take 2 pixels of space (one column of black stem, one column of white space after). But the same glyph, at size 20px may take only 3 pixels (one full black stem in the middle column, and two very light gray columns on the sides). That’s clearly non-linear, because although the font size was increased 100%, the glyph width only increased 50%. Note that this is not a matter of local error. If you consider a string of i’s (“iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii”), the whole string is now only 50% wider than the one “half the size”. Line breaks will be calculated differently, page breaks will be calculated differently, and the document may end up consuming a different number of pages. In short, with non-linear layout all bets are off.
The world would have been a rosy place if we could only care about linear layout. Alas, not in this world. Damn you Nyquist! Trying frequency limited sampling on linear layout (ie. rendering to pixelated screens) results in aliasing. Almost all of the knobs mentioned earlier have the same mandate: to fight aliasing. Some do that at the expense of non-linearity, some don’t. Here’s how:
Note that out of all the knobs mentioned above, only one introduces non-linearity into layout: metrics hinting. As we said before, some combinations of the knobs make more sense than others. Here are some of the ways that the various knobs interact with each other:
By enabling metrics hinting, you get the much more elegant rendering:
Make sure you view the images at 100% zoom, or they may show other artifacts. A corollary of this is that if you want linear layout and hence no metrics hinting, you better have subpixel text positioning supported or things will look bad.
Or this one, if you happen to have subpixel antialiasing going on also:
These are some of the conditions under which you may want to decide that non-linear layout is a good match for you:
If you are happy with non-linear layout, good for you. Just implement as many of the rasterization techniques as you can afford (grayscale anti-aliasing, subpixel anti-aliasing, subpixel text positioning), and provide as many different hinting options as you care to do, and choose the combination that looks best on your target devices given your criteria for “best”.
There are situations that you have no choice but to do linear layout. In particular:
To handle emerging high-dpi displays, Chrome added a variable called the device-scale-factor. For example, if the display is 240dpi, Chrome will layout the page as if it was 120dpi, and then scale everything by a factor of 2. There are two issues with this approach:
Not surprisingly, if your layout algorithm is non-linear (hinting-metrics enabled), this will break badly, as can be seen in the image below:
Note how glyph spacing is completely off. Check these sequences: “Sea”, “sea”, and “rvi”. This is using hinted metrics suitable for 120dpi, multiplying by two, and using those with rasterization done for 240dpi.
Needless to say, something needs to be done about this. We can’t ship that! To confirm that we have hinting enabled, lets look at the width of an ‘i’ at different sizes:
Yep. Very well-hinted indeed. The values reported are in nominal pixels. Each nominal pixel is two physical pixels wide in the device-scale-factor=2 case.
To solve this problem we explore three different possible solutions in the following sections.
The simplest way to fix the issue is to remove non-linearity during layout. At 240dpi, it’s much easier to get away with no-hinting than it is at 96dpi or even 120dpi. So, we can disable hinting, enable subpixel text positioning, and get beautiful text again:
Now lets see what happened to ‘i’ measurements:
Look just how linear those widths are. Harmony. Problem solved.
Now, as we discussed before, there may be valid reasons to not want to do that. Web app / CSS backward compatibility is one such reason. A long-tail of hidden bugs rising from assuming integer segments in the codebase is another.
In the meantime, lets see how else we can fix this problem...
As we demonstrated the problem is rooted in the Chrome compositor doing a 2x scaling of the layout and expecting it to look right at twice the resolution. There is no inherent reason for handling high-density displays this way. Ie. instead of laying out at 120dpi and scaling the results, we can simply layout at 240dpi. That’s afterall what the non-high-dpi-phobic would do. This possibility was explored in this webkit bug. Here’s the mandatory screenshot:
Looks respectable. Lets check the ‘i’s:
Note how the glyphs are taking non-integer widths in the nominal pixels now. This reflects the fact that they are hinted to whole physical pixels, each of which is 0.5 nominal pixel under this high-dpi mode.
Now, some would argue that this has the same problems the previous solution had. And we agree to some extent. Plus, now layout is dependent on the device-scale-factor, which some may find unacceptable!
A.K.A., have your cake and eat it too. Ok, looks like we’re at a stalemate. There is one more thing we can try though: Hint glyph shapes and metrics for the low-resolution layout, then scale the results linearly. The resulting rendering is not optimal for the physical pixels of the screen, but is a legitimate approach. It just skews glyph shapes a bit unnecessarily. We explored that in this change. Screenshot:
This doesn’t look perfect, but is very respectable. Note the ‘rvi’ sequence, it’s cramped, but that’s what you get when you rasterize shapes hinted for one size at another! Lets check the ‘i’s:
Note how the nominal width is exactly the same as the problematic case.
This solution looks better than it sounds. If you have asked me whether someone can do this I would have had said you get garbage. However, we’re not. The primary reason seems to be that we are using FreeType’s autohinter, and in the slight hinting mode. In the slight hinting mode, the strokes are not all necessarily snapped to full pixels (but the total glyph width is). This is good, otherwise we would have had got rendering that essentially is bulky two-pixel-wide all over. We are not. Here are two zoomed-in pictures of the ‘i’s to show that:
Properly hinted rasterization:
Our hybrid weirdo:
And no-hinting subpixel-positioned for comparison:
The most noticeable difference between the properly hinted version and our hybrid version seems to be in the hinting of the Y direction. Which makes sense. In the hybrid, you note that the autohinter has been trying hard to snap horizontal edges to whole nominal-pixel boundaries, which translates to two physical pixels in the rendering. Ie, the distance between the body of ‘i’ and the dot, the height of the body, etc, are mostly even numbers. The same is true about the full glyph width, but not about the individual features (the serifs, the stem width).
These nice properties will not hold for aggressively hinted fonts though. For example, bytecode-hinted fonts like Times New Roman, Tahoma, and Arial may simply render bulky using this method. We have not tested that.
What have we been smoking? Is the idea of hinting and then scaling so broken that one should not even explore it? Lets see what other systems do when it comes to their approach to text rendering:
That last case is interesting and worth looking into. I’m assuming that Apple made this decision for backwards compatibility with web apps when CSS was being written in exact numbers of pixels. (pre-historic by today’s standards). Anyway, here’s the numbers for Safari in the low-dpi mode:
Fair enough: they do hinting, except for very small sizes. That’s actually a good idea that we may want to implement in FreeType.
Now, lets see what the numbers look like when we switch to the high-dpi mode, as shipped with the retina Macbook Pros:
Snap! Twice the size, but exactly same numbers! Close inspection proved to me that Safari is indeed hinting metrics at the low-dpi mode and scaling by a factor of 2. In other words, no matter what size you try, you cannot get an ‘i’ glyph to consume an odd number of physical pixels on the retina display.
This is interesting. And very close to our hack! Now, if I was to explain the why, I think it’s because Apple does not like to break anyone’s layout, and to minimize the amount of testing needed by app developers. That is why they always wait for the technology to get there to switch to exactly twice the resolution / density, such that they can multiply everything by two without having to deal with rounding issues and seams. That makes some kind of sense given that they wanted to stay with hinted metrics for Safari even though the rest of OS X was not hinting metrics. And while it doesn’t look perfect, it does not look bad either:
As to the how, their approach seems to be different from, and superior to, ours. Lets look at the close-up of the ‘i’s:
Note how even in the Y direction the shapes seem to be very linearly transformed and not snapped. This is in fact a result of different approaches to hinting between Apple and FreeType’s autohinter more than anything else. We can, if we want to, add a mode to FreeType to force hinting metrics for one resolution, but hint the actual stems for another resolution given the desired glyph width as a constraint. We don’t have any free software solution for performing that operation. It is not impossible to develop but it is not a trivial task either.
We presented three solutions to solve the same problem. Each has its own merits and shortcomings. Personally I would recommend that for high-dpi devices we stick to full subpixel positioned text (no metrics hinting), and only if that proved to be problematic consider switching to the hybrid solution.
For low-dpi devices we should stick with what we currently have, ie. hinted text and no subpixel text positioning. That is, unless, we can sort out our gamma and subpixel filtering issues, have great displays on devices we ship, and full subpixel text looks good on them. That’s not a very likely situation in the short term.
An interesting situation arises when you attach two displays to a device, one high-dpi and one low-dpi. Do we care about the layout on both displays being the same? If yes (and only if there’s a good reason for that answer), then if the main display (ie. laptop display) is low-dpi, and external display is high-dpi, then we will end up using the hybrid approach on the external display. That’s one possible place that this may be useful even if we end up not using it for our main high-dpi display.
There are a few good reads on the net that explore some of these ideas in more detail:
[1] The wall will move with the bullet.