You are viewing this page with JavaSCript disabled; the work on this page was done in order to integrate it into the Processing.js javascript library, so if you want to see the final result, you'll have to enable javascript.

Let's make a small font

A story about creating a 528 byte TTF file, encoded as a 408 character javascript function that generates its BASE64 equivalence string.

updates:
28 July, 544 byte font, 456 byte generative javascript
30 July, 544 byte font, 424 byte generative javascript
1 August, 528 byte font, 408 byte generative javascript

Let's get started

Chances are you already know what a font is. It's something you select in a word processor, or text editor if you're hard core, and write your text in. They're the things that make letters look different in documents, and in today's world, the web. For the longest time the web's been a bit of a limited font game, but in recent years "web fonts" have become more and more popular. The ability to load the font YOU want to use, rather than that "Times" font, or just "serif", has won a lot of people over (fun fact: IE's been able to do this since version 4. I know, who would have guessed, eh?)

What you probably don't know is that fonts are hell. There are a number of common formats, and none of them are what you'd describe as "easy to read". In order to make fonts small, the data inside a font has been encoded in most spectacularly space-saving ways, after which a million and one features were tacked on top because different groups needed additional functionality from fonts. This right to left writing, subroutines/substitutions so that compositional characters (such as nearly all CJK characters) take up less space, vertical metrics for Asian scripts, substitution pairs for letter combinations, which is the backbone for written Arabian, the list goes on and on. If you are thinking about going into fonts, from a programming perspective, rather than a design perspective, I tip my hat to you; you are in for a rough ride.

However, should you preservere, you might end up where I am today: a head full of knowledge about things most people stay away from, and a sudden realisation that that thing you're trying to do is something you can actually do - I need the smallest possible OpenType font I can get, for some font detection work... can I simply make it myself?

Some backs story: I've written my own OpenType font parser, with TrueType and Type2 support, and while writing it I've learned many things about font technologies. I used this thing for playing around with CJK character composition, and then after a while fonts slowly sank back into the background. Then one day, while talking about log visualisations someone mentioned using the Processing programming language. Having never heard of it, I looked it up, discovered it was about the best programming language in the world for visualisations, and then dicovered it had a javascript port, too, which got me hooked.

I started helping out with Processing.js, and after a while I realised I could exploit the web's various technologies to mix visualisation in browsers with my font parser in the backend for a much more playful font interaction. And, of course, as these things go I became a dev for Pjs and started looking more and more into its font implementations for every font bug report we received. After we released v1.2.3, font handling desperately needed a full rewrite and I found myself back in a "fixing font things" position. And that brings us to today.

Today, I find myself in the position that we need to wait with starting a Processing "sketch" (the Processing name for a program) on a web page until all fonts that should be preloaded, have finished preloading. This means we can't just wait for the browser to finished downloading them, but we also have to wait for the browser to finish loading them into memory for styling text on a page. If you think that waiting for the download should be enough: fonts are relatively big, complicated things. It can take a few hundred milliseconds between a font being done downloading and being fully loaded in memory if it's a few hundred kb, but bigger, professional fonts in four different styles means it can actually take more than a second between 'download complete' and 'font available for styling'.

I had already written a font detection tester page based on a tiny font, to see whether I could detect client-side font availability by referencing two divs, one with font-family "tiny", and one with font-family "YourFontHere, tiny". A simply timeout loop made sure that as long as the widths were the same, "YourFontHere" wasn't found, so it wasn't installed. I could port that idea for detecting font-load-completion for @font-face fonts, but the reference font I used was big. Over 50kb. And that's way too big to bundle with a javascript library.

So we get to where we are now. I need something really small, with "never used" tiny metrics, so the best place to start is my original tiny font. It was made using FontForge, and has a small em-quad setting (30 units) and a tiny glyph for a few letters (a 10x10 unit rectangular glyphs). This has been compacted (yay for FontForge!) and saved as TTF. Since I used the same glyph for 97 characters in the lower ascii region, it's bigger than strictly necessary, and comes out at a filesize of 66kb.

Is that small enough?

Now, that's okay, but do we really need all those letters? to be honest, we don't. So let's fire up TTX and kill everything except the "A". That'll do, really. stripping everything not ".notdef" and "A", then converting the .ttx file back to ttf shows a massive reduction in size: a 1568 bytes font! That's pretty good, and it'll load in Chrome, Firefox, Opera, Internet Explorer and Safari if you try to use it in an @font-face rule, but... I know fonts, and even though 1568 sounds tiny, I read the "45 byte ELF executable" article; I know I can probably at the very least halve its size with "legal" changes to the file.

Pushing past 1568 bytes

So back to the TTX xml. There's several CMAP entries, all for the same letter. Do we need those? Basically: no we don't. If we only leave the {platformID 3, platEncID 1, language 0} format 4 subtable, then things still work. Excellent! Now the font's 1092 bytes.

But what are these mystery tables? "cvt", "gasp" and "FFTM" are all non-essential tables. Let's just remove them. Now it's starting to get interesting, because the new size is 1004 bytes. That's better, but only a little bit. Let's go prune the glyph definitions and see if we can fix some things there

  <glyf>
    <TTGlyph name=".notdef" xMin="1" yMin="0" xMax="9" yMax="15">
      <contour>
        <pt x="1" y="0" on="1"/>
        <pt x="1" y="15" on="1"/>
        <pt x="9" y="15" on="1"/>
        <pt x="9" y="0" on="1"/>
      </contour>
      <contour>
        <pt x="2" y="1" on="1"/>
        <pt x="8" y="1" on="1"/>
        <pt x="8" y="14" on="1"/>
        <pt x="2" y="14" on="1"/>
      </contour>
      <instructions><assembly>
        </assembly></instructions>
    </TTGlyph>

    <TTGlyph name="A" xMin="0" yMin="0" xMax="1" yMax="1">
      <contour>
        <pt x="0" y="0" on="1"/>
        <pt x="0" y="1" on="1"/>
        <pt x="1" y="1" on="1"/>
        <pt x="1" y="0" on="1"/>
      </contour>
      <instructions><assembly>
        </assembly></instructions>
    </TTGlyph>
  </glyf>

What? I didn't define a .notdef outline, what's it doing there? And not only does it have some default outline, it's two compounds! Pruning time:

  <glyf>
    <TTGlyph name=".notdef" xMin="0" yMin="0" xMax="0" yMax="0">
      <contour>
        <pt x="0" y="0" on="1"/>
      </contour>
      <instructions><assembly>
        </assembly></instructions>
    </TTGlyph>

    <TTGlyph name="A" xMin="0" yMin="0" xMax="1" yMax="1">
      <contour>
        <pt x="0" y="0" on="1"/>
        <pt x="1" y="1" on="1"/>
      </contour>
      <instructions><assembly>
        </assembly></instructions>
    </TTGlyph>
  </glyf>

And now it's 980 bytes. Excellent. Basically the .notdef glyph needs an outline, but since we'll never actually be using it, since I'll only use this font for the letter "A", it can have a nonsense outline. The letter A, though, needs at least two coordinates for it to count as a real glyph.

So... what else can we do... whoa, what's this NAME table?

  <name>
    <namerecord nameID="0" platformID="1" platEncID="0" langID="0x0">
      There is no copyright on this font.
    </namerecord>
    <namerecord nameID="1" platformID="1" platEncID="0" langID="0x0">
      Empty30
    </namerecord>
    <namerecord nameID="2" platformID="1" platEncID="0" langID="0x0">
      Medium
    </namerecord>
    <namerecord nameID="3" platformID="1" platEncID="0" langID="0x0">
      FontForge 2.0 : Empty30 : 5-10-2010
    </namerecord>
    <namerecord nameID="4" platformID="1" platEncID="0" langID="0x0">
      Empty30
    </namerecord>
    <namerecord nameID="5" platformID="1" platEncID="0" langID="0x0">
      Version 001.000 
    </namerecord>
    <namerecord nameID="6" platformID="1" platEncID="0" langID="0x0">
      Empty30
    </namerecord>
    <namerecord nameID="0" platformID="3" platEncID="1" langID="0x409">
      There is no copyright on this font.
    </namerecord>
    <namerecord nameID="1" platformID="3" platEncID="1" langID="0x409">
      Empty30
    </namerecord>
    <namerecord nameID="2" platformID="3" platEncID="1" langID="0x409">
      Medium
    </namerecord>
    <namerecord nameID="3" platformID="3" platEncID="1" langID="0x409">
      FontForge 2.0 : Empty30 : 5-10-2010
    </namerecord>
    <namerecord nameID="4" platformID="3" platEncID="1" langID="0x409">
      Empty30
    </namerecord>
    <namerecord nameID="5" platformID="3" platEncID="1" langID="0x409">
      Version 001.000 
    </namerecord>
    <namerecord nameID="6" platformID="3" platEncID="1" langID="0x409">
      Empty30
    </namerecord>
  </name>

That's a LOT of plain text information! ... You know what, let's just remove all of it and see what happens.

  <name>
    <namerecord nameID="0" platformID="1" platEncID="0" langID="0x0">
    </namerecord>
    <namerecord nameID="1" platformID="1" platEncID="0" langID="0x0">
    </namerecord>
    <namerecord nameID="2" platformID="1" platEncID="0" langID="0x0">
    </namerecord>
    <namerecord nameID="3" platformID="1" platEncID="0" langID="0x0">
    </namerecord>
    <namerecord nameID="4" platformID="1" platEncID="0" langID="0x0">
    </namerecord>
    <namerecord nameID="5" platformID="1" platEncID="0" langID="0x0">
    </namerecord>
    <namerecord nameID="6" platformID="1" platEncID="0" langID="0x0">
    </namerecord>
    <namerecord nameID="0" platformID="3" platEncID="1" langID="0x409">
    </namerecord>
    <namerecord nameID="1" platformID="3" platEncID="1" langID="0x409">
    </namerecord>
    <namerecord nameID="2" platformID="3" platEncID="1" langID="0x409">
    </namerecord>
    <namerecord nameID="3" platformID="3" platEncID="1" langID="0x409">
    </namerecord>
    <namerecord nameID="4" platformID="3" platEncID="1" langID="0x409">
    </namerecord>
    <namerecord nameID="5" platformID="3" platEncID="1" langID="0x409">
    </namerecord>
    <namerecord nameID="6" platformID="3" platEncID="1" langID="0x409">
    </namerecord>
  </name>

I can't believe that still works! IE doesn't complain, so obviously that's good enough! ... except it turns out Opera now rejects this font. I'll spare you the details, it turns out that a version is required, but nothing else is. And it can be anything. Instead of picking just any value, I'll give this a version "@", which is ascii value 0x40, and which means it'll be stored as 01000000. Having those zeroes there is going to become important very soon.

So how are we doing on size? 688 bytes! Now we're getting somewhere! Sadly, that's also roughly the furthest we're going to get without removing things that breaks the font in Opera (first) or IE (a bit later). So let's do some parallel thinking. Instead of linking to this font, it's small enough to actually inject into a page using a data-URI, which requires the data is BASE64 encoded. Funny thing about BASE64: it encodes stretches of zero-bytes as sequences of the letter "A". So, the more zero bytes we can effect in the TTF file - without it breaking any of the browsers - the more we can compress the BASE64 string, even if we can't make the ttf file itself smaller

Pushing past 688 bytes

As it turns out, almost EVERY value in the ttx dump can be set to zero. If you ever had to wade through the OpenType spec, this is fantastically fascinating. Some values, however, must remain real values. Although they don't have to necessarily be values that you would consider 'correct'.

Let's update the header table first:

  <head>
    <tableVersion value="1.0"/>
    <fontRevision value="1.0"/>
    <checkSumAdjustment value="0x908bda5e"/>
    <magicNumber value="0x5f0f3cf5"/>
    <flags value="00000000 00001011"/>
    <unitsPerEm value="32"/>
    <created value="Mon Jan 00 00:00:00 0000"/>
    <modified value="Mon Jan 00 00:00:00 0000"/>
    <xMin value="0"/>
    <yMin value="0"/>
    <xMax value="0"/>
    <yMax value="0"/>
    <macStyle value="00000000 00000000"/>
    <lowestRecPPEM value="0"/>
    <fontDirectionHint value="0"/>
    <indexToLocFormat value="0"/>
    <glyphDataFormat value="0"/>
  </head>

Note that we're basically lying here. the xMax and yMax values are 1, but an intelligent font engine will get those values from the glyph it's actually rendering, so we set them to 0. Zero bytes are nice. We leave the first six values as they are, because they're important, but the rest? Zeroes, all the way down.

Next up, MAXP.

  <maxp>
    <!-- Most of this table will be recalculated by the compiler -->
    <tableVersion value="0x00000"/>
    <numGlyphs value="0"/>
    <maxPoints value="0"/>
    <maxContours value="0"/>
    <maxCompositePoints value="0"/>
    <maxCompositeContours value="0"/>
    <maxZones value="0"/>
    <maxTwilightPoints value="0"/>
    <maxStorage value="0"/>
    <maxFunctionDefs value="0"/>
    <maxInstructionDefs value="0"/>
    <maxStackElements value="0"/>
    <maxSizeOfInstructions value="0"/>
    <maxComponentElements value="0"/>
    <maxComponentDepth value="0"/>
  </maxp>

While TTX says that most of this table will be recalculated, it doesn't hurt to make sure that the values it doesn't recalculate are zero. Because that's still a font that is accepted by all browsers.

Then, the OS/2 table. If you are old enough to remember the OS/2 Warp operating system, this name might be deceiving. Yes, that's what it was originally for, but the data in it is so damn useful that it would have been really silly to get rid of it. Downside: this is a required table and we can't just remove it. Don't worry, I tried. It doesn't work.

But we CAN set almost every damn value in it to zero, and it'll still count as a usable font. For a very specific purpose, but legal in all browsers, and that's what matters. In fact, except for the following values, everything else is 0. Including the entire "panose" structure:

  <OS_2>
    <version value="1"/>
    <xAvgCharWidth value="512"/>
    <usWeightClass value="512"/>
    <usWidthClass value="1"/>
    <achVendID value="noop"/>
    <fsSelection value="00000000 01000000"/>
    <fsFirstCharIndex value="35"/>
    <fsLastCharIndex value="35"/>
    <sTypoAscender value="1"/>
    <usWinAscent value="1"/>
  </OS_2>

Wait, did I just set the ulUnicodeRange and ulCodePageRange values to zero? And that's legal? ... wow. But, sure enough none of the browsers complain, and they apply the font just fine.

That basically leaves HMTX and POST. The first is really simple, and has two entries, one for ".notdef" and one for "A". Just to keep with the "lots of zeroes" idea, we set the width and lsb to zero, and job's done. Now for POST...

POST is an annoying table. It provides the information that printers need in order to print this font, but I don't want to print it. Removing it is not an option, because it's really really hardcore required. So it's the same trick: zeroes everywhere.

  <post>
    <formatType value="3.0"/>
    <italicAngle value="0.0"/>
    <underlinePosition value="0"/>
    <underlineThickness value="0"/>
    <isFixedPitch value="0"/>
    <minMemType42 value="0"/>
    <maxMemType42 value="0"/>
    <minMemType1 value="0"/>
    <maxMemType1 value="0"/>
  </post>

It's going to have to do. We now have a font that is so small, I can just show you what the byte layout is:

00 01 00 00 00 0A 00 80 | 00 03 00 20 4F 53 2F 32 | 71 95 70 D4 00 00 01 28 | 00 00 00 56 63 6D 61 70 | 
00 0C 00 74 00 00 01 88 | 00 00 00 2C 67 6C 79 66 | 01 04 62 39 00 00 01 BC | 00 00 00 24 68 65 61 64 | 
DE 06 54 5C 00 00 00 AC | 00 00 00 36 68 68 65 61 | 00 04 00 00 00 00 00 E4 | 00 00 00 24 68 6D 74 78 | 
00 00 00 00 00 00 01 80 | 00 00 00 06 6C 6F 63 61 | 00 12 00 08 00 00 01 B4 | 00 00 00 06 6D 61 78 70 | 
00 04 00 02 00 00 01 08 | 00 00 00 20 6E 61 6D 65 | 00 DF 1C AB 00 00 01 E0 | 00 00 00 B0 70 6F 73 74 | 
00 03 00 00 00 00 02 90 | 00 00 00 20 00 01 00 00 | 00 01 00 00 02 48 13 63 | 5F 0F 3C F5 00 0B 00 20 | 
00 00 00 00 B4 91 A2 80 | 00 00 00 00 CA 57 74 C6 | 00 00 00 00 00 01 00 01 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 01 00 00 | 00 01 00 00 00 00 00 00 | 00 00 FF FF 00 01 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 01 | 00 01 00 00 00 02 00 02 | 00 01 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 01 02 00 02 00 00 01 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 6E 6F 6F 70 00 40 | 00 23 00 23 00 01 00 00 | 00 00 00 01 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 00 01 00 03 00 01 | 00 00 00 0C 00 04 00 20 | 00 00 00 04 00 04 00 01 | 
00 00 00 41 FF FF 00 00 | 00 41 FF FF FF C0 00 01 | 00 00 00 00 00 00 00 08 | 00 12 00 00 00 01 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 31 00 00 01 00 00 | 00 00 00 01 00 01 00 01 | 00 00 31 37 01 01 00 00 | 
00 00 00 0E 00 AE 00 01 | 00 00 00 00 00 00 00 00 | 00 00 00 01 00 00 00 00 | 00 01 00 00 00 00 00 01 | 
00 00 00 00 00 02 00 00 | 00 00 00 01 00 00 00 00 | 00 03 00 00 00 00 00 01 | 00 00 00 00 00 04 00 00 | 
00 00 00 01 00 00 00 00 | 00 05 00 00 00 00 00 01 | 00 00 00 00 00 06 00 00 | 00 00 00 03 00 01 04 09 | 
00 00 00 00 00 00 00 03 | 00 01 04 09 00 01 00 00 | 00 00 00 03 00 01 04 09 | 00 02 00 02 00 00 00 03 | 
00 01 04 09 00 03 00 00 | 00 00 00 03 00 01 04 09 | 00 04 00 00 00 00 00 03 | 00 01 04 09 00 05 00 00 | 
00 00 00 03 00 01 04 09 | 00 06 00 00 00 00 00 40 | 00 03 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |

That's it. That's an entire OpenType with TrueType outline font. But there are still parts there that look like they'd be much nicer if some "01" entries were "00". Let's whip out a hex editor and get cracking.

The font layout at this point is as follows:

  header:
    version:  0001.0000 , number of tables: 10, search range: 128, entry selector: 3, range shift: 32

  tables:
    name: head, checkSum: -570010532, offset: 172, length: 54
    name: hhea, checkSum:     262144, offset: 228, length: 36
    name: maxp, checkSum:     262146, offset: 264, length: 32
    name: OS/2, checkSum: 1905619156, offset: 296, length: 86
    name: hmtx, checkSum:          0, offset: 384, length: 6
    name: cmap, checkSum:     786548, offset: 392, length: 44
    name: loca, checkSum:    1179656, offset: 436, length: 6
    name: glyf, checkSum:   17064505, offset: 444, length: 36
    name: name, checkSum:   14621867, offset: 480, length: 176
    name: post, checkSum:     196608, offset: 656, length: 32

So let's do some marking, coloring tables alternatingly blue and green:

00 01 00 00 00 0A 00 80 | 00 03 00 20 4F 53 2F 32 | 71 95 70 D4 00 00 01 28 | 00 00 00 56 63 6D 61 70 | 
00 0C 00 74 00 00 01 88 | 00 00 00 2C 67 6C 79 66 | 01 04 62 39 00 00 01 BC | 00 00 00 24 68 65 61 64 | 
DE 06 54 5C 00 00 00 AC | 00 00 00 36 68 68 65 61 | 00 04 00 00 00 00 00 E4 | 00 00 00 24 68 6D 74 78 | 
00 00 00 00 00 00 01 80 | 00 00 00 06 6C 6F 63 61 | 00 12 00 08 00 00 01 B4 | 00 00 00 06 6D 61 78 70 | 
00 04 00 02 00 00 01 08 | 00 00 00 20 6E 61 6D 65 | 00 DF 1C AB 00 00 01 E0 | 00 00 00 B0 70 6F 73 74 | 
00 03 00 00 00 00 02 90 | 00 00 00 20 00 01 00 00 | 00 01 00 00 02 48 13 63 | 5F 0F 3C F5 00 0B 00 20 | 
00 00 00 00 B4 91 A2 80 | 00 00 00 00 CA 57 74 C6 | 00 00 00 00 00 01 00 01 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 01 00 00 | 00 01 00 00 00 00 00 00 | 00 00 FF FF 00 01 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 01 | 00 01 00 00 00 02 00 02 | 00 01 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 01 02 00 02 00 00 01 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 6E 6F 6F 70 00 40 | 00 23 00 23 00 01 00 00 | 00 00 00 01 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 00 01 00 03 00 01 | 00 00 00 0C 00 04 00 20 | 00 00 00 04 00 04 00 01 | 
00 00 00 41 FF FF 00 00 | 00 41 FF FF FF C0 00 01 | 00 00 00 00 00 00 00 08 | 00 12 00 00 00 01 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 31 00 00 01 00 00 | 00 00 00 01 00 01 00 01 | 00 00 31 37 01 01 00 00 | 
00 00 00 0E 00 AE 00 01 | 00 00 00 00 00 00 00 00 | 00 00 00 01 00 00 00 00 | 00 01 00 00 00 00 00 01 | 
00 00 00 00 00 02 00 00 | 00 00 00 01 00 00 00 00 | 00 03 00 00 00 00 00 01 | 00 00 00 00 00 04 00 00 | 
00 00 00 01 00 00 00 00 | 00 05 00 00 00 00 00 01 | 00 00 00 00 00 06 00 00 | 00 00 00 03 00 01 04 09 | 
00 00 00 00 00 00 00 03 | 00 01 04 09 00 01 00 00 | 00 00 00 03 00 01 04 09 | 00 02 00 02 00 00 00 03 | 
00 01 04 09 00 03 00 00 | 00 00 00 03 00 01 04 09 | 00 04 00 00 00 00 00 03 | 00 01 04 09 00 05 00 00 | 
00 00 00 03 00 01 04 09 | 00 06 00 00 00 00 00 40 | 00 03 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 
00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |

Okay, so now that things are readable, let's get byte-whacking. First, let's deal with that HEAD table. There are lots of bytes that represent strings. It would be nice if we could zero those, but sadly, we can't. The HEAD table is pretty much invulnerable when it comes to byte sniping. Moving on, let's try the HHEA table instead. This is its layout:

  00 01 00 00 version number "01.00" - we can't touch this
        00 01 Ascender... this can be made 0
        00 00
        00 00
        00 00
        00 00
        FF FF minRightSideBearing... this can be made 0
        00 01 xMaxExtent... we can lie, and make this 0
        00 00
        00 00
        00 00
        00 00
        00 00
        00 00
        00 00
        00 00
        00 01 numberOfHMetrics - we can't touch this either  

As long as we don't touch the version number and the number of metrics, we can create a nice strech of zeroes. On to the MAXP table.

  00 01 00 00 version number "01.00" - we don't want to touch this
        00 02 numGlyphs
        00 02 maxPoints
        00 01 maxContours
        00 00
        .. ..
        00 00  

Sadly, we can't touch these values either. At least not without breaking the font in Opera. Boo! Ohwell, onward. And by onward, I mean let's skip OS/2, because it's too important. And HMTX is already all zeroes. We can't touch CMAP, because all the information is about where to find the outline data, so on to LOCA. Here's where things get interesting.

LOCA is the "Location to Index" table. It says which offset, relative to the beginning of the GLYF table, to use for which glyph. It currently looks like this:

    00 00
    00 08
    00 12  

But... if we want all the glyphs to look the same, can we just make them all 00? Wouldn't that make everything point to glyph 0? In theory, yes. But sadly this time Opera, Chrome and Firefox no longer consider this a legal font. So LOCA stays the same. We can't make it 00 00 00 08 00 08 either, because then Opera refuses to load it. Damn you, browsers.

Next up, the GLYF table. We're not touching that.

So that leaves the NAME and POST tables. Last one first: There is an annoying "03" element that we'd like to set to "00". Sadly for us, this time Chrome complains. So POST stays the way it is.

That leaves NAME...

00 00   table format 0. we don't touch his.
00 0E   number of entries in this table - 14
00 AE   start of the "strings" section for this table: (174 - 6) bytes from here

this is then followed by "name records", which consists of six 16bit values: {platformID, encodingID, languageID, nameID, Length, offset}. In order to make this table compact, all strings are stored in the last part of the table, and entries that use the same string simply point to the same byte position in that last section.

 1) 00 01 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 2) 00 01 | 00 00 | 00 00 | 00 01 | 00 00 | 00 00
 3) 00 01 | 00 00 | 00 00 | 00 02 | 00 00 | 00 00
 4) 00 01 | 00 00 | 00 00 | 00 03 | 00 00 | 00 00
 5) 00 01 | 00 00 | 00 00 | 00 04 | 00 00 | 00 00
 6) 00 01 | 00 00 | 00 00 | 00 05 | 00 00 | 00 00
 7) 00 01 | 00 00 | 00 00 | 00 06 | 00 00 | 00 00
 8) 00 03 | 00 01 | 04 09 | 00 00 | 00 00 | 00 00
 9) 00 03 | 00 01 | 04 09 | 00 01 | 00 00 | 00 00
10) 00 03 | 00 01 | 04 09 | 00 02 | 00 02 | 00 00
11) 00 03 | 00 01 | 04 09 | 00 03 | 00 00 | 00 00
12) 00 03 | 00 01 | 04 09 | 00 04 | 00 00 | 00 00
13) 00 03 | 00 01 | 04 09 | 00 05 | 00 00 | 00 00
14) 00 03 | 00 01 | 04 09 | 00 06 | 00 00 | 00 00
    00 40

the "00 40" at the end is the entire string table. Remember, we had to set a version, and picked "@", which was ascii character 0x40... there it is. So immediately, let's set that value "02" for record 10 to "00", and move the "40" up. Excellent! Except now it won't work in Opera. What if we just keep the record as is, but set the string table to 00 00?

Well, turns out that's just fine. Removing the value in the TTX xml wasn't allowed, but setting it to zero bytes, as long as it's there, is perfectly acceptable.

But what about those useless records? Can we just carpet-bomb them with zeroes? Well... yes O_o

 1) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 2) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 3) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 4) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 5) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 6) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 7) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 8) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
 9) 00 03 | 00 01 | 04 09 | 00 01 | 00 00 | 00 00
10) 00 03 | 00 01 | 04 09 | 00 02 | 00 02 | 00 00
11) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
12) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
13) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
14) 00 00 | 00 00 | 00 00 | 00 00 | 00 00 | 00 00
    00 00

Holy something from somewhere! Wait, doesn't that mean there are only 2 records? 9, and 10? Let's go and prune this table so that it looks like this:

    00 00   (table format 0)
    00 02   (2 entries in this table)
    00 1E   (start of the "strings" section for this table: 0 + 6 + 24 = 30 = 0x1E)
 1) 00 03 | 00 01 | 04 09 | 00 01 | 00 00 | 00 00
 2) 00 03 | 00 01 | 04 09 | 00 02 | 00 02 | 00 00
    00 00

Note that as of this change, this font is can no longer be converted to .ttx and then back to .ttf, because as far as TTX is concerned, the name table is now malformed. Even though technically it isn't.

Of course, this change this will move the POST table up, so we need to make sure that we correct its offset value. And the checksum for the NAME table will now fail, so we need to make sure it's right:

  NAME: 00 00 00 02 | 00 1E 00 03 | 00 01 04 09 | 00 01 00 00 | 00 00 00 03 | 00 01 04 09 | 00 02 00 02 | 00 00 00 00
  offset: 480
  length: 32
  checksum: ...

How do we calculate the checksum? Thankfully the OpenType specification is really clear on this:

  Table checksums are the unsigned sum of the longs of a given table.
  In C, the following function can be used to determine a checksum:

  ULONG CalcTableChecksum(ULONG *Table, ULONG Length)
  {
    ULONG Sum = 0L;
    ULONG *Endptr = Table+((Length+3) & ~3) / sizeof(ULONG);
    while (Table < EndPtr)
      Sum += *Table++;
    return Sum;
  }

Fair enough. It's only 32 bytes, or 8 ULONG values, so the chucksem is the sum of these eight values. Let's just do this by hand: 0xSUM = [2 + 1E0003 + 10409 + 10000 + 3 + 10409 + 20002] == 0x23081C (= 2295836 decimal).

  NAME: 00 00 00 02 00 1E 00 03 00 01 04 09 00 01 00 00 00 00 00 03 00 01 04 09 00 02 00 02 00 00 00 00
  checksum: 2295836 (= [00 23 08 1C])
  offset: 480
  length: 32 (= 0x20)
  
  POST: content is the same
  checksum: the same as before
  offset: 656 - (176-32) = 512 (=0x200)
  length: the same as before

Does this still work? You bet. And it kills off loads of data. We now have a 544 byte OpenType font with truetype outlines that is accepted by all five browsers!

July 30 Update

But wait... is that checksum actually necessary? What if I just set all of them to zero? That would create some more repetetive data, right? Sure enough, it turns out the checksums are like traffic lights - they're more of a guideline that you can complete ignore when it's safe to do so. Since fonts can be huge, and verifying that every table in a font matches its checksum can take seconds for fonts (average CJK font size on my computer: 12MB), browsers are (rightly) not going to bother with verifying this value.

Here's where things get annoying. While you'd think that font engines dont rely on the order in which the tables are indicated in the header, since they're not guaranteed to be sequential, it turns out that they don't like it when you start to mark table offsets so that the content overlaps with other tables. This sucks, because we could easily move the HMTX table to offset [01 03], saving 6 bytes, and alter the POST table so that it's version [00 01] instead of [00 03], which would make it identical, byte for byte, to the MAXP table, so we could have set its offset to the same as MAXP. That would have saved us another 32 bytes.

Of course, because setting the POST table to [00 01] still makes the data more repetitive, we add that change.

August 1 Update

This pretty much covers everything we can 'legally' do with the font. We pruned away things we won't be using, and set values we couldn't remove to zero. But can we prune even more data by "liberally" intepreting the OpenType specification?

Pushing 16 bytes past 544 bytes

For instance, let's look at the POST table some more. I don't need it, but OpenType says it has to be there. But what do the footnotes say?

"The last four entries in the table are present because PostScript drivers can do better memory management if the virtual memory (VM) requirements of a downloadable OpenType font are known before the font is downloaded. This information should be supplied if known. If it is not known, set the value to zero. The driver will still work but will be less efficient."

Interesting. Are font engines smart enough to assume 0 values if the values don't even exist? Turns out: yes, they are. Changing the table header so that the POST length is 0x10 instead of 0x20, and cutting the last 16 zero-bytes off the table gives us a 528 byte font that is still accepted by all current versions of the five big browsers.

And increasing compressibility just a bit more

Finally, we also optimise the OS/2 table. I didn't want to do this earlier, because it's a pretty fragile table, but at this point I've run out of things I can do to the font by reading the spec, so it's "try and see what happens" time.

First, we set the xAvgCharWidth to zero. This is legal (if very strange), and improves compressibility.

Then we set the usWeightClass to one, which is the lowest non-zero value we can legally assign it.

We also blank the Font Vendor Identification string. Leaving it blank in TTX results in the string "noop" in the byte code, but since it's not referenced, we can replace this with [00 00 00 00] and further improve compressibility.

We then unset all the fsSelection bits, because they are irrelevant. This font will never be used in a classification system, so the content of the category bits are arbitrary. That's two more zero bytes.

Continuing on, we set the typographic ascender to zero. Its main use is to detemine a correct default line height, and as this font will never be used for mult-line content, an effective line height of zero is just fine.

Finally, there was the suggestion from the OpenType mailing list to try using a version 0 OS/2 table. This version is much, much older, but also much smaller. Oddly enough, or perhaps annoyingly enough, none of the browsers respect the OS/2 table version number, and they assume that it's version 5 (which uses byte 00 04... don't ask). The upside is that it means that as a last optimisation, we can safely set the version number to [00 00] and gain that last bit of extra compressibility.

We now have a 528 byte TTF that will compress quite ridiculously well. Stand-alone gzipping using -9 turns it into a 242 byte file, so let's see how small we can get it as page-embedded string

Putting it on a web page - BASE64 and beyond

We now have our base font, and it's tiny:

00 01 00 00 00 0A 00 80 | 00 03 00 20 4F 53 2F 32 | 00 00 00 00 00 00 01 28 | 00 00 00 56 63 6D 61 70
00 00 00 00 00 00 01 88 | 00 00 00 2C 67 6C 79 66 | 00 00 00 00 00 00 01 BC | 00 00 00 24 68 65 61 64
00 00 00 00 00 00 00 AC | 00 00 00 38 68 68 65 61 | 00 00 00 00 00 00 00 E4 | 00 00 00 24 68 6D 74 78
00 00 00 00 00 00 01 80 | 00 00 00 06 6C 6F 63 61 | 00 00 00 00 00 00 01 B4 | 00 00 00 06 6D 61 78 70
00 00 00 00 00 00 01 08 | 00 00 00 20 6E 61 6D 65 | 00 00 00 00 00 00 01 E0 | 00 00 00 20 70 6F 73 74
00 00 00 00 00 00 02 00 | 00 00 00 10 00 01 00 00 | 00 01 00 00 02 48 13 63 | 5F 0F 3C F5 00 0B 00 20
00 00 00 00 B4 91 A2 80 | 00 00 00 00 CA 57 74 C6 | 00 00 00 00 00 01 00 01 | 00 00 00 00 00 00 00 00
00 00 00 00 00 01 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 01 | 00 01 00 00 00 02 00 02 | 00 01 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 | 00 00 00 00 00 01 00 01 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 | 00 23 00 23 00 00 00 00 | 00 00 00 01 00 00 00 00 | 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 | 00 00 00 01 00 03 00 01 | 00 00 00 0C 00 04 00 20 | 00 00 00 04 00 04 00 01
00 00 00 41 FF FF 00 00 | 00 41 FF FF FF C0 00 01 | 00 00 00 00 00 00 00 08 | 00 12 00 00 00 01 00 00
00 00 00 00 00 00 00 00 | 00 00 31 00 00 01 00 00 | 00 00 00 01 00 01 00 01 | 00 00 31 37 01 01 00 00
00 00 00 02 00 1E 00 03 | 00 01 04 09 00 01 00 00 | 00 00 00 03 00 01 04 09 | 00 02 00 02 00 00 00 00
00 01 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00

That's a complete TTF file. But see all those zeroes? Let's turn those into A's by encoding this binary file using BASE64 so that it can be used in a data-URI, embedded on a webpage.

AAEAAAAKAIAAAwAgT1MvMgAAAAAAAAEoAAAAVmNtYXAAAAAAAAABiAAAACxnbHlmAAAAAAAAAbwA
AAAkaGVhZAAAAAAAAACsAAAAOGhoZWEAAAAAAAAA5AAAACRobXR4AAAAAAAAAYAAAAAGbG9jYQAA
AAAAAAG0AAAABm1heHAAAAAAAAABCAAAACBuYW1lAAAAAAAAAeAAAAAgcG9zdAAAAAAAAAIAAAAA
EAABAAAAAQAAAkgTY18PPPUACwAgAAAAALSRooAAAAAAyld0xgAAAAAAAQABAAAAAAAAAAAAAAAA
AAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAEAAAACAAIAAQAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAEAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAACMAIwAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAMA
AQAAAAwABAAgAAAABAAEAAEAAABB//8AAABB////wAABAAAAAAAAAAgAEgAAAAEAAAAAAAAAAAAA
AAAxAAABAAAAAAABAAEAAQAAMTcBAQAAAAAAAgAeAAMAAQQJAAEAAAAAAAMAAQQJAAIAAgAAAAAA
AQAAAAAAAAAAAAAAAAAA

That's a 704 byte string, which seems way more than the TTF file we started with... but it's hard to miss all that repetition in the string. You can see why making as many values 00 as possible was a good idea: there are some long stretches of A in there, and that means we can save a lot of space by reversibly replacing them with shorter strings. The following javascript function excels at that task:

// The BASE64 compression function, which simply replaces all sequences of 'A' with "-{number of A's}.".
function compressBASE64(str) {
  var replacer = function(series) {
                   return "-" + series.length + ".";
                 }
  return str.replace(/AAAA+/g,replacer);
}

This turns a string of, for instance, 16 A's into "-16." which happens to be the string representation of the number of A's, just negative. Best of all, it uses two characters not used in BASE64, so we can easily find these placeholders for 'reconstituting' the original BASE64 data. So what does running our BASE64 string through this function give us?

AAE-4.KAIAAAwAgT1MvMg-8.Eo-4.VmNtYX-9.Bi-4.CxnbHlm-9.bw-4.kaGVhZ-9.Cs-4.OGho
ZWE-9.5-4.CRobXR4-9.Y-5.GbG9jYQ-8.G0-4.Bm1heH-9.BC-4.CBuYW1l-9.e-5.gcG9zd-9.
I-5.EAAB-5.QAAAkgTY18PPPUACwAg-5.LSRoo-6.yld0xg-7.QAB-18.E-44.BAAE-4.CAAIAAQ
-36.EAAQ-75.CMAIw-9.B-31.BAAMAAQ-4.wABAAg-4.BAAEAAEAAABB//8AAABB////wAAB-10.
gAEg-4.E-16.xAAAB-7.BAAEAAQAAMTcBAQ-7.gAeAAMAAQQJAAE-7.MAAQQJAAIAAg-7.Q-18.

That's a 379 byte string, almost 150 bytes smaller than our TTF.

Pushing past 379 bytes

Or have we? See those 'AA' strings? They're pretty recurring. Let's replace them with a comma.

,E-4.KAI,AwAgT1MvMg-8.Eo-4.VmNtYX-9.Bi-4.CxnbHlm-9.bw-4.kaGVhZ-9.Cs-4.OGhoZW
E-9.5-4.CRobXR4-9.Y-5.GbG9jYQ-8.G0-4.Bm1heH-9.BC-4.CBuYW1l-9.e-5.gcG9zd-9.I-
5.E,B-5.Q,AkgTY18PPPUACwAg-5.LSRoo-6.yld0xg-7.QAB-18.E-44.B,E-4.C,I,Q-36.E,Q
-75.CMAIw-9.B-31.B,M,Q-4.wAB,g-4.B,E,E,ABB//8,ABB////w,B-10.gAEg-4.E-16.x,AB
-7.B,E,Q,MTcBAQ-7.gAe,M,QQJ,E-7.M,QQJ,I,g-7.Q-18.

353 bytes. that's small enough to add ".replace(/,/g,'AA')" and still be smaller than 379 bytes. Only 7 bytes, but at this point every single byte counts.

Final stop: 353 bytes... in 505 bytes

So where does that leave us in terms of the "full" thing that'll give us the uncompressed BASE64 data? In order to make use of this file on a webpage, we need to be able to represent it, and decompress it, using JavaScript. All things considered, we can actually pack the 528 byte TTF file using only 505 bytes of JavaScript code!

function generateTinyFont(){return (",E-4.KAI,AwAgT1MvMg-8.Eo-4.VmNtYX-9.Bi-
4.CxnbHlm-9.bw-4.kaGVhZ-9.Cs-4.OGhoZWE-9.5-4.CRobXR4-9.Y-5.GbG9jYQ-8.G0-4.Bm
1heH-9.BC-4.CBuYW1l-9.e-5.gcG9zd-9.I-5.E,B-5.Q,AkgTY18PPPUACwAg-5.LSRoo-6.yl
d0xg-7.QAB-18.E-44.B,E-4.C,I,Q-36.E,Q-75.CMAIw-9.B-31.B,M,Q-4.wAB,g-4.B,E,E,
ABB//8,ABB////w,B-10.gAEg-4.E-16.x,AB-7.B,E,Q,MTcBAQ-7.gAe,M,QQJ,E-7.M,QQJ,I
,g-7.Q-18.").replace(/,/g,'AA').replace(/-\d+\./g,function(m){return(
function(n){return(new Array(++n)).join("A");})(-m);});}

And that's it. 505 characters to generate a fully legal TTF data URI, which when loaded in an @font-face src attribute, is compatible with Chrome, Firefox, Opera, Safari and Internet Explorer!

Of course, refinements were bound to happen, so we managed to make it even smaller. The long version follows the original page in a "Refinements" section. The short version is that we got it down to 411 bytes for the full generative function, still using the same function name.

The proof is in the pudding.

But you want proof that it works. I anticipated this, and made sure that this very page uses this font. Of course, it's tiny, so you didn't see it, but just for fun, scroll to the top of this page... do you see the text "Let's make a small font"? Copy-paste that text. Anywhere will do, a text editor is the most useful.

Yes, that's a million billion A's in there, that you didn't see because they have been typeset using this tiny, tiny font, which was loaded through an @font-face declaration in a dynamically created <style> element using the BASE64 string generating function that is in that code block just above.

This stuff is amazing.

So... what did you do this for?

In a nutshell: @font-face custom font loading detecting. Loading custom fonts on a webpage takes time, because the font has to be downloaded, and even when it has been downloaded, it has to be loaded into memory. So, between "showing the page" and "showing the page with the right font", there's a period where the text has been typeset, but with the wrong font. Best case, this period is imperceivably short. But the bigger the fonts, the bigger the problem. Even a 100Kb font can cause a website to "flip" between the initial fallback font, and the intended font. If that's the page's main font, the user-experience is ruined. For web animation using the canvas, the problem is even bigger, because you might see an animation that's started before all the fonts are done downloading, and the text will suddenly change typeface a few, or maybe even 100 frames into the animation.

What this font lets you do is typeset a div with "font-family: 'Desired Font', tinyfont", and then poll its dimensions over a few milliseconds. You can detect whether a font is really available now, using two approaches: 1) if the div has a width that is "essentially 0px", the desired font's not loaded yet. Or, 2) if the div has the same width as a reference div typeset with only the tinyfont, the desired font's not loaded yet. this makes it possible to accurately tell whether a browser is ready to show what you want to show people, rather than allowing it to show users something you didn't want them to see at all.

My personal motive was to work this into processing.js, which is the JavaScript interpreter for Processing source code, used to write visual programs such as graphics and animations. You can imagine how shitty an animation would be if it used "sans-serif" for the first 10 or 20 frames, before using that beautiful font you bought specifically for your animation =)

-- Mike "Pomax" Kamermans
nihongoresources.com

You can reach me via email at "Pomax" at nihongoresources, but you'll also find me on several IRC networks under the same name, mostly on irc.mozilla.org, irc.freenode.net, irc.highway.net, irc.aniverse.com, and flakily on enterthegame and EFnet.

this page was created on July 28th, 2011, and last updated on August 1st.


Refinements

After showing this article to some of the other devs in the #processing.js channel on mozilla's IRC server, Yuri Delendik started playing with the data too, and managed to come up with a ridiculously cunning rewrite that compacts BASE64 strings in a different way, but still based on A-reduction.

I'll show you what he did using the July 28 BASE64 string, which was based on the 544 byte TTF, which still had real values for table checksums, rather than zeroes, had a full POST table and most of the OS/2 filled in with 'real' values. Don't worry, the final refinement in this section uses the current final BASE64 code, so you can copy-paste that function if you want to use it yourself.

function generateTinyFont1(i){i=0;return'AEAKAIAwAgT1MvMnGVcNQAEoAVmN
tYXADAB0ABiACxnbHlmAQRiOQAbwAkaGVhZN4GVFwACsANmhoZWEABA5ACRobXR4AYAGbG9jYQAS
AgAG0ABm1heHABACABCACBuYW1lARwAeAgcG9zdADAIAIABAQAkgTY18PPPUACwAgALSRooAyld0
xgAQABAAEAAAAABAEACAIAQAAAAQIAgAQAAAAAAAbm9vcABACMAIwABABAAAABAMAQAwABAgABAE
AEABB//8ABB////wABAgAEgAEAAxABABAEAQAMTcBAQAgAeAMAQQJAEAMAQQJAIAgAwAAAA=='.
replace(/A/g, function(a){return(new Array(~~'130202310230232306384012311233
2415414200456097999931311999012999999501006999011301311221903952611106011161
1169999'[i++]+2)).join(a);});}

This makes it hard to read in a different way: instead of markers for full AAA... sequences, only do replacements of 9 or fewer. At each position, leave one A, and store the number of As to add as a long numerical string. Now, because you only replaced at most 9 at a time, you can use the A's in the remaining string as array indices, so if the numerical string starts with "130", the first A in the original stringe gets 1 extra A after it, the second gets 3 additional A's, the third gets none, etc. I think this is terribly clever.

Getting close to 544 bytes of total code...

But this wasn't enough, so he went one step further and produced this:

function generateTinyFont(i){i=0;return'AEAKAIAwAgT1MvMnGVcNQAEoAVmNt
YXADAB0ABiACxnbHlmAQRiOQAbwAkaGVhZN4GVFwACsANmhoZWEABA5ACRobXR4AYAGbG9jYQASA
gAG0ABm1heHABACABCACBuYW1lARwAeAgcG9zdADAIAIABAQAkgTY18PPPUACwAgALSRooAyld0x
gAQABAAEAAAAABAEACAIAQAAAAQIAgAQAAAAAAAbm9vcABACMAIwABABAAAABAMAQAwABAgABAEA
EABB//8ABB////wABAgAEgAEAAxABABAEAQAMTcBAQAgAeAMAQQJAEAMAQQJAIAgAwAAAA=='.
replace(/A/g,function(){return'AAAAAAAAAA'.substr(~'130202310230232306384012
3112332415414200456097999931311999012999999501006999011301311221903952611106
0111611169999'[i++])})}

This shaves off a few more bytes, and still works because now the string of to-insert As is not generated through an array join, but by using the number string's single digits as substring limits for the ten character string "AAAAAAAAAA". Cleverly, substring is called with a squiggly tilde. This is the bitwise "not", and means that 0 becomes -1, 1 becomes -2, and so forth. So, the first A is replaced with "AAAAAAAAAA".substr(~3), which is "AAAAAAAAAA".substr(-4), which is "AAAA". This is quite smart!

Of course, we can also get rid of that ~ by flipping the numbers in the array:

function generateTinyFont(i){i=0;return"AEAKAIAwAgT1MvMnGVcNQAEoAVmNt
YXADAB0ABiACxnbHlmAQRiOQAbwAkaGVhZN4GVFwACsANmhoZWEABA5ACRobXR4AYAGbG9jYQASA
gAG0ABm1heHABACABCACBuYW1lARwAeAgcG9zdADAIAIABAQAkgTY18PPPUACwAgALSRooAyld0x
gAQABAAEAAAAABAEACAIAQAAAAQIAgAQAAAAAAAbm9vcABACMAIwABABAAAABAMAQAwABAgABAEA
EABB//8ABB////wABAgAEgAEAAxABABAEAQAMTcBAQAgAeAMAQQJAEAMAQQJAIAgAwAAAA==".
replace(/A/g,function(){return'AAAAAAAAAA'.substr("8697976897697676936159876
8876675845857995439020000686880009870000004989930009886986887780960473888939
888388830000"[i++])})}

Another byte saved. Although at this point we ran into a wall where we couldn't come up with a better compression based on this decomposition.

Blasting through the 544 limit

However, after some more tinkering, Yuri outdid himself and came up with a way that removes another 80-or-so bytes:

function generateTinyFont(){return'#E3KAI2wAgT1MvMnGVcNQ2Eo3VmNtYX#DA
B02Bi3CxnbHlmAQRiOQ2bw3kaGVhZN4GVFw2Cs3NmhoZWEAB3253CRobXR47AY3AGbG9jYQAS#g2
G03Bm1heH#B#C2BC3CBuYW1l3Rw2e3AgcG9zd#D3#I3AI#B3AQ2kgTY18PPPUACwAg3ALSRoo3#y
ld0xg32QAB77#E777773B#E3C#I#Q77732QI#g2Q77777777#bm9vcAB#CMAIwAB32B77732B#M#
Q3wAB#g3B#E#E2BB//82BB////w#B7#gAEg3E77x2B32B#E#Q#MTcBAQ32gAe#M#QQJ#E32M#QQJ
#I#g32w77777=='.replace(/#/g,'AA').replace(/[237]/g,function(a){return'AAAAA
AAA'.substr(7-a)})}

This is a well-counted 468 bytes, and a great example of how no matter how good you are, there's someone out there just as keen to give things a shot as you are. Put some heads together and magic happens.

Of course, he wasn't done. With some careful merging of the two replace operations into a single one, the code got further tightened to 456 bytes!

function generateTinyFont(){return'#E3KAI2wAgT1MvMnGVcNQ2Eo3VmNtYX#DA
B02Bi3CxnbHlmAQRiOQ2bw3kaGVhZN4GVFw2Cs3NmhoZWEAB3253CRobXR47AY3AGbG9jYQAS#g2
G03Bm1heH#B#C2BC3CBuYW1l3Rw2e3AgcG9zd#D3#I3AI#B3AQ2kgTY18PPPUACwAg3ALSRoo3#y
ld0xg32QAB77#E777773B#E3C#I#Q77732QI#g2Q77777777#bm9vcAB#CMAIwAB32B77732B#M#
Q3wAB#g3B#E#E2BB//82BB////w#B7#gAEg3E77x2B32B#E#Q#MTcBAQ32gAe#M#QQJ#E32M#QQJ
#I#g32w77777=='.replace(/[#237]/g,function(a){return'AAAAAAAA'.substr(~~a?7-a:6)})}

Final stop: 528 bytes in 704 bytes in 408 bytes

After Yuri wrote this, I made some more improvements to the base TFF file, improving compressibility by trying to get as many continuous stretches of zero bytes, and cutting off a bit of the POST talbe, so that the 544 byte TTF became a more regular 528 byte TTF font. Running its BASE64 string through the following script:

  var l = base64.split(/(AA(?:A(?:A)?(?:AAAA)?)?)/g); // 2367
  l.push("");
  var u = [], v = [];
  for(var i=0;i<l.length;i+=2) { u.push(l[i]); u.push(l[i+1] == 'AA' ? '#' : (l[i+1].length-1)); }
  u.pop();
  var y = "function generateTinyFont(){return'" + u.join('').replace(/AA/g, '#')
          + "'.replace(/[#237]/g,function(a){return'AAAAAAAA'.substr(~~a?7-a:6)})}";

gave us the following generative function for the BASE64 string:

function generateTinyFont(){return'#E3KAI2wAgT1MvMg7Eo3VmNtYX7ABi3Cxn
bHlm7Abw3kaGVhZ7ACs3OGhoZWE7A53CRobXR47AY3AGbG9jYQ7G03Bm1heH7ABC3CBuYW1l7A
e3AgcG9zd7AI3AE#B3AQ2kgTY18PPPUACwAg3ALSRoo3#yld0xg32QAB77#E777773B#E3C#I#
Q77773E#Q7777777772CMAIw7AB77732B#M#Q3wAB#g3B#E#E2BB//82BB////w#B7#gAEg3E7
7x2B32B#E#Q#MTcBAQ32gAe#M#QQJ#E32M#QQJ#I#g32Q77#'.replace(/[#237]/g,
function(a){return'AAAAAAAA'.substr(~~a?7-a:6)})}

That's only 408 bytes!


Why don't you just gzip it?

This is a very good question. Gzipping the font makes it super tiny. The 528 byte TTF file ends up being around 242 bytes (your mileage may vary depending on the exact parameters used), so clearly it might be worth telling the server to gzip it for us. However, that only works as separate download, and I don't want to have to rely on a separate download to get the font — consider that even if the font is tiny, the idea was to have a font available for testing whether downloaded fonts were done. It would be INCREDIBLY silly to first have to download the font that is supposed to check whether fonts have downloaded =)

I need this to be available even before the DOM is ready, and that means inlining it as a script. Sadly, that also means binary data is out. And don't even think about gzipping just the font, BASE64-ing that, and then adding a javascript deflator to inline-unpack it. You'll end up putting on weight... So that's one part of the answer.

The other is that now you have two options. If your server can compress your file with the plain BASE64 data better than it can compress the data with the pre-compressed generative function: use the BASE64 version. If your server is set up to use maximum compression, you can probably shave about 50 bytes off using the BASE64 string, compared to the compressed javascript version - No one's stopping you from making sure you send out the least amount of bytes.

But, if you don't have that luxury, a 408 character javascript function is a lot smaller than a 704 byte BASE64 string. While typically you're going to want to go for plain BASE64 on the page, now you have an informed choice.

(To prove the generative function works, though, obviously this page uses the compressed 408 byte javascript generator)

What about embedding it as gzipped image data in a PNG? Those get unpacked by the browser.

That's actually a really cool idea, and some people have already experimented with that notion even before HTML5 was "HTML5". The long and short of it is that you can indeed embed arbitrary data as a gzipped block inside a PNG, using pixel values to encode binary data (they're both bytes, it's perfectly legal), and then extracting the data again by grabbing the image's pixel data by drawing it on a canvas and walking through the pixel array.

Seriously, this is really cool. The problem with this is that it's only really cool for big payloads, where the act of gzipping the data saves you 500 bytes or more. For small payloads it actually bloats things tremendously: The PNG header adds a little over 70 bytes to the gzipped payload, so that's not so bad, but in order to load the image, make a canvas, get its 2D context, draw the image using this context, get the pixel data, then run through the data, decode and buffer it in a string, then load that string as javascript, you need about 400 bytes worth of javascript. And that's without data validation.


It doesn't work!

You need javascript enabled. Don't worry, this page doesn't use google analytics or some other nonsense. And feel free to check the source code if you don't believe me. This page doesn't use any imported scripts. If you're in IE, depending on your security settings, you might get a warning because of the data-URI. You can safely allow this (but again, you are responsible for making sure I'm not lying. I know I'm not, but you only have my word for it, and you don't know me).

It doesn't work in Firefox 3.6!?

So I've been told. I'll need to find a copy of that browser somewhere before I can test that. It's become a rather old browser at this point, so if it doesn't work in 3.6.x because the 544 byte TTF font is not accepted as an @font-face, I don't think I'll actually try to come up with a font reduction that also works for 3.6.x - Mozilla is trying to get people moved off of 3.6.x, with 3.6.x slated for imminent "we no longer support this application", and to be honest: if you want to keep up with the internet, update your browser too. Even though 3.6.x was comfortable when it first came out, it's outdated by now in terms of how it manages its resources, and what it supports. Move with the time, etc. etc. you know the drill. You told your friends and family the same thing a few years ago when they were still using IE =)

It also doesn't work in [...] on [...]!?

Send me an email, to pomax at nihongoresources, and I'll see what I can do. The more current browsers this works on, the better. However, I'm only trying to get this to work for browsers that support HTML5 (if your browser doesn't support the <canvas> element, for instance, I'm going to tell you to get a better browser instead).