What's the Deal With Flag Emoji?

Created: Fri Dec 02, 2022

Flag emoji like 🇨🇦 aren't actually single characters the same way that 🥧 or 😻 are. Instead, they're made up of two consecutive characters called regional indicators. There's one of these for each letter in the English alphabet, and if you use them to write a country's two-letter code, they'll show up as that country's flag. Unless you're on Windows.

Why'd they do it this way, instead of just having a character for each flag? A hundred-odd extra characters sure sounds like a lot, but surely there's room if they can fit in weird math symbols like ⦕ and ⨗ and ⋚ (less than or equal to or greater than, not to be confused with ⋛ greater than or equal to or less than).

Or everyone's favourite Multiöcular O ꙮ which appeared in a single document and then they noticed over a decade later that they copied it wrong.

It's also dwarfed by the tens of thousands of Chinese characters Unicode has, some of which don't even exist!

Doing it this way does save the Unicode folks a lot of trouble if countries change their flags or pop in and out of existence, and it means they don't have to deal with the thorny question of who even counts as a country. They do maintain a list of flag emoji and how they look, but these concerns are still mostly passed on to vendors.

But that's not the real reason, which has its roots in the series of events that filled the web with emoji in the first place.

Unicode sprung from the discovery that after the invention of computers, letting them talk to each other was the second-biggest mistake. Computers only operate on sequences of ones and zeroes, so they need rules for what they mean. If you copy those bits from one computer to another, and they disagree on what they mean, you're gonna have problems.

One kind of rule is a text encoding, which maps numbers to writing symbols, like letters and punctuation. Different languages have different things that they write, so there ended up being a whole bunch of these. A whole bunch of rules that say the same numbers meant different symbols.

So if you send some text written in one language, like, say, your address in Russia, to someone whose computer is using a different language, like, say, a friend in France who wants to send you a parcel, what they'll see is a mess of symbols that they'll dutifully copy down because of course it's supposed to look weird it's in Russian.

So, along came Unicode with the mission to create one big character set that could represent text in any language, and even multiple at once! To do this, their character set would have to include any textual symbols that the world's computers were already using.

In Japan, phone carriers had some space left over in their text encodings, so they used it to add some pictures (絵, e) that you could type like any other character (文字, moji). These picture characters were called emoji, a word that has no etymological connection to the similar-sounding emoticon.

Because emoji were text that people used to write things, Unicode had no choice but to include them and a proposal was submitted.

Among the facial expressions and vegetables and feline facial expressions and blood types are a half-score of symbols so unassuming that they don't even get reference images: these are the flag emoji, referred to here as regional indicators.

They weren't supposed to be flags originally, they were just there for compatibility. But vendors ended up giving them pictures anyway.

And there were only ten of them, based on what the carriers had had. Unless you count the crossed flags 🎌 from the 'Celebration' category, in which case Japan got to have two emoji flags to itself. Or three?

Germany and Ireland had an objection to this.

N3583 proposes to add a single Joker character, but this makes no sense in the absence of an encoded set of cards as proposed in N2760 “Proposal to encode dominoes and other game symbols in the UCS”.

Err, no, not that. But that's a good point. What they objected to was there being 10 countries singled out to have their own flag emoji.

Germany (one of the 10 countries “lucky” enough to be included in emoji glyph sets) concurs with Ireland (one of the 193 countries “unlucky” enough not to be included in emoji glyph sets) that this is an inappropriate use of the UCS. Politically, the proposal made in N3583 is both naïve (no one is fooled) and untenable (being prejudicial against 95% of the world’s sovereign states). However, there is a simple way forward that will solve this otherwise insoluble problem.

The Hiberno-German "simple way forward" was to instead have 676 characters, one for every pair of English letters. These wouldn't represent flags necessarily, but anything that could need 2-letter codes.

This was rejected on the grounds that these would be too ambiguous and meaningless, so they came back with a reworked proposal.

They reïterate their position that only having the original ten just won't do:

The usage scenario for the 10 characters is simple: they are represented by national flag glyphs in current implementations, in order to indicate the 10 countries, which (evidently) are (or were when the glyph set was designed) important to users of Japanese telephones. These would have traditionally been rejected from encoding in the UCS as “logos”. However, if the US wishes us to consider that it is necessary for us to agree to encode these 10, we would like to make it clear that we can only do so in the context of the encoding of a complete set of these entities. Otherwise we cannot see a way to support the 10. We don’t actually want the 10 encoded either, but if they must be, the encoding must be comprehensive.

They changed their stance on their fix: the letter-pairs now do stand for countries, and only add the ones that the ISO says stand for a country.

But in the end, they settled on the system we have now: a new regional indicator character for each English letter, and it was up to vendors to draw them as flags if they spell out a country code.

But there are also emoji for flags that aren't countries! Like the pirate flag 🏴‍☠️, the rainbow flag 🏳️‍🌈, and the flags of the countries making up the U.K. like 🏴󠁧󠁢󠁳󠁣󠁴󠁿 which may or may not display properly as the Scottish flag because those are special for forthcoming reasons.

The first two use another common emoji trick. Some sequences of other emoji, with a special character called a Zero-Width Joiner between them, are combined into a single new one. The pirate flag is a black flag 🏴 (another non-country flag! This one is its own character) zero-width joined to a skull and crossbones ☠️. The rainbow flag is a white flag 🏳️ zero-width joined to a rainbow 🌈.

This same trick is used for gendering emoji, like joining a basketballer ⛹️ with a female sign ♀️ to get a woman playing basketball ⛹️‍♀️.

It's not used for skin tones like 🏾 which don't need a zero-width joiner to give you ⛹🏾‍♀️.

The U.K. country flags are different still. Those are a black flag followed by a different set of alphabet-based characters called tag letters. These are used to spell out a code for a subdivision of a country, like gbsct for Scotland, and then the sequence is capped off with a cancel tag.

This was originally supposed to be able to represent a whole bunch of non-country flags like California and Ontario, but it turns out that that's way too many flags so that's not gonna be happening anymore.

And because no exception is complete without its own exceptions, the non-country flag emoji of the European Union 🇪🇺 and Antarctica 🇦🇶 get to be regional indicator pairs.