August 28th, 2012


Pango-HarfBuzz Merge and The Effects on Thai Module

One major change in GNOME 3.6 is Pango’s shaper engines replacement with HarfBuzz. Only language engines (for word break analysis, for example) are retained. So, I’m checking how this affects Thai/Lao rendering and what to do next.

Over all, Behdad has put a good effort to make it right. Most Uniscribe behaviors have been achieved for compatibility. He even cares enough to cover some widespread Thai fonts in which the language tag 'latn' is used instead of 'thai', as seen in Mozilla #719366. Unfortunately, this font set has been declared as standard fonts in official documents. The workaround seems inevitable.

Supported Fonts

In my experiments with some existing Thai OpenType fonts, the new Pango still renders well without regression.

Loma font from fonts-tlwg (glyph positioning with GPOS, rearrangement with GSUB):

Loma on new Pango

Arundina Sans font from Fonts-SIPA-Arundina (positioning by substitution, only GSUB, no GPOS):

Arundina Sans on new Pango

But for legacy fonts without OpenType features, it renders badly:

Non-OpenType on new Pango

In addition, according to Behdad, PUA glyphs in legacy fonts are not supported yet. This means there will be regression on fonts designed for Windows XP or below. But modern fonts designed for Windows 7 should be fine.

Changes on Bugs

The engine replacement from scratch certainly affects existing bugs. Some become obsolete, some still remain. Here are the summary for Thai/Lao engine, as resolved upstream:

Closed bugs:

  • GNOME #616495 (Debian #620001) regarding Lao MAI KONG rendering, which was caused by a flaw in my code. I have proposed a patch for a while, but no action is taken yet. And patched debs have been locally distributed as a workaround. However, with HarfBuzz replacement, the bug has now gone.
  • GNOME #378001 regarding minority languages supports. I hadn’t worked on this because I was waiting for WTT 3.0, a local standard, to be drafted. Anyway, with HarfBuzz replacement, the old WTT 2.0 clustering has been dumped and replaced with Unicode guidelines. Therefore, I assume it should be now possible to render minority languages with Thai script, provided that the font has the required OpenType features.
  • GNOME #393307, #677090 regarding wrong rendering of zero-width marks like ZWJ, ZWSP. This bug has also been dumped with the HarfBuzz replacement.

Questionable bugs:

  • GNOME #583718 (Debian #620002) regarding the rendition of Thai SARA AM (U+0E33) on VTE with excessive dotted circle. So far, I have disagreed with Behdad whether this bug should be treated along with Indic scripts. IMO, there is an easy path for Thai by rendering monospace fonts differently, which is also in accordance with widespread practice everywhere else, albeit XTerm, Emacs, or framebuffer TTYs. But Behdad doesn’t like the idea and insists that it should be treated along with Indic scripts, which would complicate things a lot. So, the bug has been there for many years. Meanwhile, I have also provided a workaround in the aforementioned patched debs.

    BTW, the situation has been changed a little bit after the HarfBuzz replacement. Firstly, let’s see the problem with current Pango:

    Thai on VTE with current Pango

    One can easily spot the dotted circle glitches. And here is how I workaround it, which is like how it's rendered on other terminals:

    Thai on VTE with patched Pango

    With the HarfBuzz replacement, here is how it renders:

    Thai on VTE with new Pango

    That is, although it’s still wrong, it’s more tolerable. So, the question for users is: Could they tolerate this until VTE is redesigned for Indic scripts supports?

Remaining bugs:

  • GNOME #576156 (Debian #620004) regarding weird cursor movements caused by Unicode UAX #29. Many amendment efforts have been pushed to Unicode from different sources, until it was finally accepted in Unicode 6.1.0. However, no action has been taken in Pango yet. We still have to push it further. Again, a fix has also been provided in the patched debs.