HCML – One more markup language targetting XHTML generation

Highly Compatible Markup Language, HCML, is a markup language with a syntax related to S-expressions. The name is a bad pun and a typo look-a-like, but it is intended to convey the XHTML produced from it should be validating and easily accessible by as many as possible using as many different devices as possible.

The only target for HCML at the time of writing is XHTML 1.0 Strict. Whitespace in the input is in most cases normalized to a single space character, multiple newlines are normalized to a single one. Like (X)HTML source documents, HCML source files can be formatted relatively freely without affecting output.

Most elements in HCML map more or less directly to XHTML, though there are several extra limitations. For instance, no nesting of any kind of lists are supported. Strict XHTML 1.0 has no type attribute for lists, the author did not want to involve CSS, and nested lists should be of different types for accessibility with screen readers. Text not enclosed in a suitable command is viewed as erroneous input.

1.0 Basic syntax

Each token is separated by whitespace, there is no separate command character class. Every command in the language is recognized an opening curly brace, followed by the name of the command, any other parameters for the command, and at last a closing curly brace. A single paragraph of text might look like the following:

{ | Nej tak; jeg forråder ikke slige hemmeligheder til kvaksalvere. De var i
stand til at forkluddre ham endnu mere for mig. Men metoden er probat. Jeg har
anvendt den på Molvik også. Ham har jeg gjort "dæmonisk". Det er nu den
fontanellen, jeg måtte sætte ham i nakken. }

1.1 Rules for escaping, quoting, and text operators

Space characters are escaped with a backslash, backslash also escapes itself and any other character. There is no support for symbolic backslash, e.g. \n will only result in the character n being added to the document.

Three characters must be escaped to safely quote HCML:

  1. \
  2. {
  3. }

Do note all backslashes must be escaped, but only curly braces with whitespace on both sides (including vertical whitespace and the start or end of input) must be quoted. “ } ” is a syntactic element, but “word} ” is an arbitrary word with no special handling. This is a basic consequence of the grammar only using space as token delimiter.

1.2 Summary of basic text operators:

\
Escape the next character, to include a whitespace character in a word token or a literal backslash. “a b” is two tokens, while “a\ b” is one. “\\” is a literal backslash. Since this is the mechanism used to handle whitespace itself, backslash is the only operator which is not whitespace delimited and should not be enclosed in curly braces.
<
Outputs an opening curly bracket. “{ < }” results in “{”.
>
Outputs a closing curly bracket. “{ > }” results in “}”.
_
The underscore operator merges all arguments to a single argument, for use with other functions which do not handle all arguments uniformly. “{ _ a b c }” will create a single token of the value “a b c”
||
Concatenate all arguments to a single token. “{ || a b c }” results in “abc”.

2.0 Document structure

2.1 Document title

The command “T” sets the document title. The title does not need to be at the start of the document, but a top level heading with a unique ID attribute will be added where command occurs. A title is mandatory, as a title element is mandatory for a validating XHTML document. Example:

{ T HCML – One more markup language targetting XHTML generation }

2.2 Headers

There are only two levels of headers, apart from the document title, the commands being “H” and “h”. All headers have a unique ID attribute to facilitate linking. Example:

{ H Large header }
{ h Small header }

2.3 Paragraphs

Pipes denote normal paragraphs of text. Paragraphs can only include basic text elements and links. Example:

{ | Pipes denote normal paragraphs of text. Example: }

2.4 Links

The same mnemonic as in HTML is used for creating links, “a”. Links are one of the very few elements allowed to be included in most other elements in HCML.

{ | A paragraph of text with a { a ./README.xhtml circular link. } }

2.5 Illustrations

The command “i” will is used to add images to a document, images must include a text for the “alt” attribute as well as caption which will be placed directly below. The first argument is source of the illustration, the second the alt attribute, the rest is used for the caption. Example:

{ i ./bust.png
    { _ Bust portraying a Roman male }
    Figure 1: A portrait bust from estimated CE 240 standing on a small pillar
    with the number 52.
}

The above HCML fragment will produce the following XHTML fragment:

<p><img src="./bust.png" alt="Bust portraying a Roman male" />
<br />
Figure 1: A portrait bust from estimated CE 240 standing on a small pillar
with the number 52.</p>

Do note the use of the underscore operator to allow the second argument, the alt attribute, include spaces without having to escape each space character.

2.6 Preformatted text

“m” is used for preformatted text, mnemonic “monospace”. All text, including whitespace separating the tokens, will be wrapped in “pre” tags in resulting XHTML, with no normalization or compression of whitespace. Example:

{ m
    def __parse_stream(self):
        token = self.next()
        while token.kind != spacelexer.TokenType.END_OF_STREAM:
            if token.content == "{":
                element = self.resolve_command()
                check_input(element.kind in (ElementType.PARAGRAPH,
                        ElementType.HEADER, ElementType.MONOSPACE,
                        ElementType.LIST))
                self.documentbuffer.append(element)
            else:
                raise SyntaxError("Orphan data")
            token = self.next()
}

2.7 Ordered lists

Ordered list are created with the “O” command, all arguments must be list items created with the “-” command. Only text and links are allowed in inside list items. Example:

{ O
{ - A }
{ - B }
{ - C }
}

2.8 Unordered lists

Ordered list are created with the “L” command, all arguments must be list items created with the “-” command. Only text and links are allowed in inside list items. Example:

{ L
{ - Tables }
{ - Figures }
}

2.9 Definitions lists

Lists of definitions are created by the command “D”. Terms to be defined are created with “t”, definitions for these with “d”. No other argument types are allowed. Only text and links are allowed in inside the definitions, the terms can only be text.

{ D
{ t term which needs a definition }
{ d Lofty definition going on and on. }
}

3.0 Tools

3.1 hcml2xhtml

Reads HCML from STDIN, emits XHTML to STDOUT. Run with “-h” flag to get list of options.

3.2 hcmlquote

The tool hcmlquote reads text from STDIN and emits safely quoted HCML to STDOUT. Intended for including existing text, like source code, into HCML documents.

4.0 Licence

Copyright 2020 Steinar Knutsen

Licensed under the EUPL, Version 1.2 or – as soon they will be approved by the European Commission – subsequent versions of the EUPL (the “Licence”); You may not use this work except in compliance with the Licence. You may obtain a copy of the Licence at:

https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12

Unless required by applicable law or agreed to in writing, software distributed under the Licence is distributed on an “AS IS” basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licence for the specific language governing permissions and limitations under the Licence.

Steinar Knutsen, 20201105T102731Z, 773F7C9E