Title: Rich Text, Poor Text
Author: Adam Moore (LÆMEUR) <adam@laemeur.com>
Date: February 9, 2013
Revisited: January 17, 2014

Rich Text, Poor Text

Bold, italic, subscript, superscript, underlines, strike-throughs — I don't find any of these presentational attributes of text any more frivolous than quotation marks and exclamation points. I mean, really, if the goal was to be starkly minimalistic about it, we could write prose for electronic transmission with letters, spaces and line-breaks, and throw-out all the explicit markup. We don't call it "markup" when it's been around for more than fifty years, we call it punctuation instead, but it's the same thing.

THE WRITTEN WORD HAS QUITE A FORCEFUL APPEARANCE THIS WAY

ALTHOUGH AS STATED IN MY LAST ESSAY IT IS SOMEWHAT LACKING IN NUANCE

IT LACKS ALSO FLEXIBILITY

ONE MUST STRUCTURE THEIR STATEMENTS WITH CONSIDERABLY MORE CLARITY WHEN THEY DO NOT HAVE THE CRUTCH OF COMMAS AND PARENTHESIS AND OTHER DELIMITERS TO LEAN UPON

But nobody wants to have to live in that world. We all recognize the expressive doors opened-up by our little tool-box of commas and asterisks and hyphens and slashes. And the utility of presentational attributes like bold text and underlines for clarity and expressiveness are no less appreciated. In fact, their availability when composing text on the computer is now taken for granted — to the extent that I don't even have the option of composing plain-text messages in Gmail anymore(1).

But there's a problem with the way these attributes are stored on the computer. Back in the 1960s, when the American Standard Code for Information Interchange was being worked-out and the decisions were being made about what to encode in the meager 7-bit address space of the code, there wasn't room enough to store additional presentational information about each character, so that information was necessarily left out. Only the basics made it in: letters, numbers, punctuation, and some control codes.

The only way to get presentational information into your text was to start embedding information about the information within the information. That is to say that within a stream of bytes, some of those bytes would represent a message, and some of those bytes would represent how to present that message. ANSI did this near the hardware level with escape sequences in the 1970s,  and innumerable schemes have been arrived-at for doing this with software throughout the decades.

The problem with this approach is that it pollutes text-streams with non-text information. Ted Nelson explains the larger implications of this in his article, Embedded Markup Considered Harmful.

My further objection to using embedded markup for these presentational attributes is that by omitting them from the character coding scheme, they are denied as elements of language and, to use some Nelsonian terminology, they are treated as packaging rather than content.

I maintain that they are just as much language content as the exclamation mark is.

In the 1980s, when Joe Becker proposed Unicode, a "wide-body ASCII", to encompass all of the world's alphabets and syllabaries and ideograms, no provision was made for encoding presentational attributes. In fact, such "fancy-text" is explicitly unsupported in the Unicode 88 proposal. I simply don't agree with this approach. Unicode has since strayed from its aim of "fixed one-to-one correspondence with characters of the world's writing systems" by supporting multiple-character combinations to add diacritical marks to a glyph — not at all dissimilar from the method of ANSI escape-sequences — yet still it has no standardized coding for ubiquitous, pan-lingual presentational conventions such as bold text.

Were it my world to command, I'd simply move the everything to a 32-bit coding scheme and reserve the top 8 bits for presentational attributes. The lower 24 would remain for whatever 16.7 million characters people can dream to fill-up the space with.

—L.
D29

Afterword

Looking back at this a year after writing it, it seems a little ...hasty? While I think conflating "markup" and "punctuation" was a step too far, particularly in light of the fact that the former term has accrued considerable connotative baggage in the last ~20 years, I do think it's worth investigating the boundaries of orthography (or graphology?) and style; is language just the marks we make, or can it also be the way we make them?

—L.
E17

Notes

1. Someone must have complained, because this option has returned —L. E17

comments powered by Disqus