kilabit.info
| AmA | Build | Email | GitHub | Mastodon | Projects | SourceHut

This document contains grammar of asciidoc document markup language based on Asciidoctor User Manual.

About implementation

We try to follow the document syntax rules, but there are some inconsistencies we found when the document parsed and rendered to HTML. For example, the current asciidoctor allow the following inline formatting,

_A `B_ C`

to be rendered into the following HTML tree,

<em>A <code>B</em> C</code>

This is of course rendered correctly when opened in web browser, but it seems break the tree. In the previous implementation, we able to break down it into the following tree,

<em>
<code>B</code>
</em>
<code>C</code>

But its open many inline formatting permutations which make the code more complex than it should.

This implementation,

  • use the strict asciidoctor syntax rules which we define in this document.

  • minimize duplicate markup.

    • Support only "" "" syntax, drop "xref:" syntax

Common grammar

EMPTY     = ""

DQUOTE    = %d34  ; "

WORD      = 1*VCHAR           ; Sequence of visible character without
                              ; white spaces.

STRING    = WORD *(WSP WORD)  ; Sequence of word with spaces between them.

LINE      = STRING LF         ; STRING that end with new line.

TEXT      = 1*LINE            ; One or more LINE.

REF_ID    = 1*ALPHA *("-" / "_" / ALPHA / DIGIT)

Document header

Document header consist of title and optional authors, a revision, and zero or more attributes. The author and revision MUST be after title and in order. The document attributes can be in any order, after title, author or revision.

DOC_HEADER     = [ "=" SP DOC_TITLE LF
                 [ DOC_AUTHORS LF
                 [ DOC_REVISION LF ]]]
                 (*DOC_ATTRIBUTE)
                 LF

An empty line mark as the end of document header.

Title

DOC_TITLE     = 1*WORD [DOC_TITLE_SEP SUBTITLE]

DOC_TITLE_SEP = ":"

SUBTITLE      = 1*WORD

Author information

DOC_AUTHORS   = MAILBOX *( ";" MAILBOX )

  MAILBOX     = STRING [ "<" EMAIL ">" ]

  EMAIL       = WORD "@" WORD "." 1*8ALPHA
              ; simplified syntax of email format.

Revision information

DOC_REVISION     = DOC_REV_VERSION [ "," DOC_REV_DATE ]

DOC_REV_VERSION  = "v" 1*DIGIT "." 1*DIGIT "." 1*DIGIT

DOC_REV_DATE     = 1*2DIGIT 3*ALPHA 4*DIGIT

Attributes

There are also document attributes which affect how the document rendered,

DOC_ATTRIBUTE  = ":" DOC_ATTR_KEY ":" *LINE ("\" *LINE) LF

DOC_ATTR_KEY   = ( "toc" / "sectanchors" / "sectlinks"
               /   "imagesdir" / "data-uri" / *META_KEY ) LF

META_KEY       = 1*(META_KEY_CHAR / '_') *(META_KEY_CHAR / '-')

META_KEY_CHAR  = (A..Z | a..z | 0..9)

HTML format

HTML format for rendering section header,

<div id="header">
  <h1>{DOC_TITLE}{DOC_TITLE_SEP} {SUBTITLE}</h1>
  <div class="details">
    <span id="author" class="author">{DOC_AUTHORS}</span>
    <br>
    <span id="revnumber">{DOC_REV_VERSION} , </span>
    <span id="revdate">{DOC_REV_DATE} </span>
  </div>
</div>

Document preamble

Any content after document title and before the new section is considered as document preamble and its rendered inside the "content", not "header".

HTML format,

<div id="content">
  <div id="preamble">
    <div class="sectionbody">
      {DOC_PREAMBLE}
    </div>
  </div>
  ...
</div>

Block

BLOCK_REF   = "[#" REF_ID *["." RoleName] "]" LF

Attribute

BLOCK_ATTR  = "[" ATTR_NAME ("=" ATTR_VALUE) *("," ATTR_OPT) "]" LF

ATTR_NAME   = WORD

ATTR_VALUE  = STRING

ATTR_OPT    = ATTR_NAME ("=") ATTR_VALUE)

Table of contents

The table of contents (ToC) will be generated if "toc" attribute is set in document header with the following syntax,

TOC_ATTR      = ":toc:" (TOC_PLACEMENT / TOC_POSITION )

TOC_PLACEMENT = ("auto" / "preamble" / "macro")

TOC_POSITION  = ("left" / "right")

TOC_MACRO     = "toc::[]"

If toc placement is empty it default to "auto", and placed after document header. If toc is set to "preamble" it will be set after document preamble. If toc is set to "macro", it will be set after section title that have TOC_MACRO.

Title

By default the ToC element will have the title set to "Table of Contents". One can change the ToC title using attribute "toc-title",

TOC_TITLE  = ":toc-title:" LINE

Levels

By default only section level 1 and 2 will be rendered. One can change it using the attribute "toclevels",

TOC_LEVELS = ":toclevels:" 1DIGIT

Sections

Sections or headers group one or more paragraphs or blocks. Each section is started with '=' character or '#' (markdown). There are six levels or sections that are allowed in asciidoc, any more than that will be considered as paragraph.

SECTION          = [BLOCK_REF]
                   2*6(EQUAL/HASH) 1*WSP LINE LF

HTML format,

HTML class for section is sectN, where N is the level, which is equal to number of '=' minus 1.

<div class="sectN">
  <hN>{WORD}</hN>
  <div class="sectionbody">
    ...
  </div>
</div>

Section Attributes

idprefix

":idprefix:" EMPTY / REF_ID

The idprefix must be ASCII string. It must start with "_", "-", or ASCII letters, otherwise the "_" will be prepended. If one of the character is not valid, it will replaced with "_".

idseparator

":idseparator:" EMPTY / "-" / "_" / ALPHA

The idseparator can be empty or single ASCII character ("_" or "-", ASCII letter, or digit). It is used to replace invalid REF_ID character.

Comments

COMMENT_SINGLE = "//" LINE

COMMENT_BLOCK  = "////" LF
                 *LINE
                 "////" LF

COMMENTS = *(COMMENT_SINGLE / COMMENT_BLOCK)

The comment line cannot start with spaces, due to Block literal.

Block listing

LISTING_STYLE = "[listing]" LF TEXT LF

LISTING_BLOCK = "----" LF TEXT "----" LF

Block literal

LITERAL_PARAGRAPH = 1*WSP TEXT

LITERAL_STYLE     = "[literal]" LF TEXT LF

LITERAL_BLOCK     = "...." LF TEXT "...." LF

HTML format,

<div class="literalblock">
    <div class="content">
        <pre>{{TEXT}}</pre>
    </div>
</div>

Substitution rules,

  • special characters: "<", ">", and "&"

  • callouts

Include Directive

INCLUDE_DIRECTIVE = "include::" PATH "[" ELEMENT_ATTRIBUTE "]"

PATH              = ABSOLUTE_PATH / RELATIVE_PATH

ABSOLUTE_PATH     = "/" WORD *( "/" WORD )

RELATIVE_PATH     = ( "." / ".." ) "/" WORD * ( "/" WORD )

Images

Block image

BLOCK_IMAGE   = "image::" URL "[" IMAGE_ATTRS "]"

IMAGE_ATTRS   = TEXT ("," IMAGE_WIDTH ("," IMAGE_HEIGHT)) *("," IMAGE_OPTS)

IMAGE_OPTS    = IMAGE_OPT_KEY "=" 1*VCHAR

IMAGE_OPT_KEY = "title" / "float" / "align" / "role" / "link"

Inline image

IMAGE_INLINE  = "image:" URL "[" (IMAGE_ATTRS) "]"

Video

BLOCK_VIDEO = "video::" (URL / WORD) "[" ( "youtube" / "vimeo" ) *(BLOCK_ATTR) "]"

Audio

BLOCK_AUDIO = "audio::" (URL / WORD) "["
              ( "options" "=" DQUOTE *AUDIO_ATTR_OPTIONS DQUOTE )
            "]"

AUDIO_ATTR_OPTIONS = "autoplay" | "loop" | "controls" | "nocontrols"

Block attributes

BLOCK_ATTRS = BLOCK_ATTR *( "," BLOCK_ATTR )

BLOCK_ATTR  = WORD "=" (DQUOTE) WORD (DQUOTE)

Inline formatting

There are two types of inline formatting: constrained and unconstrained. The constrained formatting only applicable if the previous character of syntax begin with non-alphanumeric and end with characters other than alpha-numeric and underscore.

FORMAT_BEGIN = WSP / "!" / DQUOTE / "#" / "$" / "%" / "&" / "'" / "(" / ")"
             / "*" / "+" / "," / "-" / "." / "/" /
             / ":" / ";" / "<" / "=" / ">" / "?" / "@"
             / "[" / "\" / "]" / "^" / "_" / "`"
             / "{" / "|" / "}" / "~"

FORMAT_END   = FORMAT_BEGIN

Unconstrained bold

TEXT_UNCONSTRAINED_BOLD = "**" TEXT "**"

Unconstrained italic

TEXT_UNCONSTRAINED_ITALIC = "__" TEXT "__"

Unconstrained mono

TEXT_UNCONSTRAINED_MONO = "``" TEXT "``"

Bold

TEXT_BOLD = FORMAT_BEGIN "*" TEXT "*" FORMAT_END

Italic

TEXT_ITALIC = FORMAT_BEGIN "_" TEXT "_" FORMAT_END

Monospace

TEXT_MONO = FORMAT_BEGIN "`" TEXT "`" FORMAT_END

Double quote curve

TEXT_QUOTE_DOUBLE = QUOTE "`" TEXT "`" QUOTE

Single quote curve

TEXT_QUOTE_SINGLE = "'`" TEXT "`'"

Subscript

TEXT_SUBSCRIPT = "~" WORD "~"

Superscript

TEXT_SUPERSCRIPT = "^" WORD "^"

Attribute reference

ATTR_REF = "{" META_KEY "}"

The attribute reference will be replace with document attributes, if its exist, otherwise it would be considered as normal text.

Passthrough

PASSTHROUGH_SINGLE = FORMAT_BEGIN "+" TEXT "+" FORMAT_END

PASSTHROUGH_DOUBLE = "++" TEXT "++"

PASSTHROUGH_TRIPLE = "+++" TEXT "+++"

PASSTHROUGH_BLOCK  = "++++" LF 1*LINE "++++" LF

PASSTHROUGH_MACRO  = "pass:" *(PASSMACRO_SUB) "[" TEXT "]"

PASSMACRO_SUB      = PASSMACRO_CHAR *("," PASSMACRO_CHAR)

PASSMACRO_CHAR     = "c" / "q" / "a" / "r" / "m" / "p"
                   / PASSMACRO_GROUP_NORMAL
                   / PASSMACRO_GROUP_VERBATIM

PASSMACRO_GROUP_NORMAL   = "n" ; equal to "c,q,r,m,p"

PASSMACRO_GROUP_VERBATIM = "v" ; equal to "c"

The "q" allow quotes substitutions.

The "m" allow macro substitutions.

The substitutions are applied in above order.

URLs

The URL should end with "[]".

URL = URL_SCHEME "://" 1*VCHAR (
      "[" URL_TEXT ("," URL_ATTR_TARGET ) ("," URL_ATTR_ROLE ) "]" ) LWSP

URL_TEXT        = TEXT ("^")

URL_ATTR_TARGET = "window" "=" "_blank"

URL_ATTR_RILE   = "role=" WORD *("," WORD)

Anchor

ANCHOR_LINE         = "[[" REF_ID "]]" LF

ANCHOR_LINE_SHORT   = "[#" REF_ID "]" LF

ANCHOR_INLINE       = "[[" REF_ID "]]" TEXT

ANCHOR_INLINE_SHORT = "[#" REF_ID "]#" TEXT "#" FORMAT_END.

Cross references

CROSS_REF_INTERNAL  = "<<" REF_ID ("," REF_LABEL) / CROSS_REF_NATURAL ">>"

CROSS_REF_NATURAL   = BLOCK_TITLE

Rendered HTML,

<a href="#REF_ID">REF_LABEL / BLOCK_TITLE</a>

The CROSS_REF_NATURAL only works if the text contains at least one uppercase or space.

Table

TABLE     = TABLE_SEP LF *ROW LF TABLE_SEP

TABLE_SEP = "|" 3*"="

ROW    = 1*CELL

CELL   = CELL_FORMAT "|" TEXT (LF)

CELL_FORMAT    = CELL_DUP / CELL_SPAN_COL/ CELL_SPAN_ROW
               / CELL_ALIGN_HOR / CELL_ALIGN_VER / CELL_STYLE

CELL_DUP       = 1*DIGIT "*"

CELL_SPAN_COL  = 1*DIGIT "+"

CELL_SPAN_ROW  = "." 1*DIGIT "+"

CELL_ALIGN_HOR = "<" / "^" / ">"

CELL_ALIGN_VER = "." ("<" / "^" / ">")

CELL_STYLE     = "a" / "d" / "e" / "h" / "l" / "m" / "s" / "v"

Footnote

Syntax,

"footnote:" [ REF_ID ] "[" STRING "]"

In asciidoctor, footnote can be placed anywhere, even after WORD without space in between.

The REF_ID, define the unique ID for footnote and can be used to reference the previous footnote. The first footnote with REF_ID, should have the STRING defined. The next footnote with the same REF_ID, should not have the STRING defined; if its defined, the STRING is ignored.

Inconsistencies and bugs in asciidoctor

Listing style "[listing]" followed by "…​." is become listing block. Example,

[listing]
....
This block become listing.
....

Image width and height with non-digits characters are allowed, Example,

image::sunset.jpg[Text,a,b]

Link with "https" end with '.' works, but "mailto" end with '.' is not working. Example,

https://asciidoctor.org.

mailto:me@example.com.

Block image with "link" option does not work as expected,

image::{image-sunset}[Block image with attribute ref, link={test-url}].

First table row with multiple lines does not considered as header, even thought it separated by empty line. Example,

|===
|A1
|B1

|A2
|B2
|===