This document contains grammar of asciidoc document markup language based on Asciidoctor User Manual.
About implementation
We try to follow the document syntax rules, but there are some inconsistencies we found when the document parsed and rendered to HTML. For example, the current asciidoctor allow the following inline formatting,
_A `B_ C`
to be rendered into the following HTML tree,
<em>A <code>B</em> C</code>
This is of course rendered correctly when opened in web browser, but it seems break the tree. In the previous implementation, we able to break down it into the following tree,
<em> <code>B</code> </em> <code>C</code>
But its open many inline formatting permutations which make the code more complex than it should.
This implementation,
-
use the strict asciidoctor syntax rules which we define in this document.
-
minimize duplicate markup.
-
Support only "" "" syntax, drop "xref:" syntax
-
Common grammar
EMPTY = "" DQUOTE = %d34 ; " WORD = 1*VCHAR ; Sequence of visible character without ; white spaces. STRING = WORD *(WSP WORD) ; Sequence of word with spaces between them. LINE = STRING LF ; STRING that end with new line. TEXT = 1*LINE ; One or more LINE. REF_ID = 1*ALPHA *("-" / "_" / ALPHA / DIGIT)
Document header
Document header consist of title and optional authors, a revision, and zero or more attributes. The author and revision MUST be after title and in order. The document attributes can be in any order, after title, author or revision.
DOC_HEADER = [ "=" SP DOC_TITLE LF [ DOC_AUTHORS LF [ DOC_REVISION LF ]]] (*DOC_ATTRIBUTE) LF
An empty line mark as the end of document header.
Author information
DOC_AUTHORS = MAILBOX *( ";" MAILBOX ) MAILBOX = STRING [ "<" EMAIL ">" ] EMAIL = WORD "@" WORD "." 1*8ALPHA ; simplified syntax of email format.
Revision information
DOC_REVISION = DOC_REV_VERSION [ "," DOC_REV_DATE ] DOC_REV_VERSION = "v" 1*DIGIT "." 1*DIGIT "." 1*DIGIT DOC_REV_DATE = 1*2DIGIT 3*ALPHA 4*DIGIT
Attributes
There are also document attributes which affect how the document rendered,
DOC_ATTRIBUTE = ":" DOC_ATTR_KEY ":" *LINE ("\" *LINE) LF DOC_ATTR_KEY = ( "toc" / "sectanchors" / "sectlinks" / "imagesdir" / "data-uri" / *META_KEY ) LF META_KEY = 1*(META_KEY_CHAR / '_') *(META_KEY_CHAR / '-') META_KEY_CHAR = (A..Z | a..z | 0..9)
HTML format
HTML format for rendering section header,
<div id="header"> <h1>{DOC_TITLE}{DOC_TITLE_SEP} {SUBTITLE}</h1> <div class="details"> <span id="author" class="author">{DOC_AUTHORS}</span> <br> <span id="revnumber">{DOC_REV_VERSION} , </span> <span id="revdate">{DOC_REV_DATE} </span> </div> </div>
Document preamble
Any content after document title and before the new section is considered as document preamble and its rendered inside the "content", not "header".
HTML format,
<div id="content"> <div id="preamble"> <div class="sectionbody"> {DOC_PREAMBLE} </div> </div> ... </div>
Block
BLOCK_REF = "[#" REF_ID *["." RoleName] "]" LF
Attribute
BLOCK_ATTR = "[" ATTR_NAME ("=" ATTR_VALUE) *("," ATTR_OPT) "]" LF ATTR_NAME = WORD ATTR_VALUE = STRING ATTR_OPT = ATTR_NAME ("=") ATTR_VALUE)
Table of contents
The table of contents (ToC) will be generated if "toc" attribute is set in document header with the following syntax,
TOC_ATTR = ":toc:" (TOC_PLACEMENT / TOC_POSITION ) TOC_PLACEMENT = ("auto" / "preamble" / "macro") TOC_POSITION = ("left" / "right") TOC_MACRO = "toc::[]"
If toc placement is empty it default to "auto", and placed after document header. If toc is set to "preamble" it will be set after document preamble. If toc is set to "macro", it will be set after section title that have TOC_MACRO.
Title
By default the ToC element will have the title set to "Table of Contents". One can change the ToC title using attribute "toc-title",
TOC_TITLE = ":toc-title:" LINE
Levels
By default only section level 1 and 2 will be rendered. One can change it using the attribute "toclevels",
TOC_LEVELS = ":toclevels:" 1DIGIT
Sections
Sections or headers group one or more paragraphs or blocks. Each section is started with '=' character or '#' (markdown). There are six levels or sections that are allowed in asciidoc, any more than that will be considered as paragraph.
SECTION = [BLOCK_REF] 2*6(EQUAL/HASH) 1*WSP LINE LF
HTML format,
HTML class for section is sectN
, where N is the level, which is equal to
number of '=' minus 1.
<div class="sectN"> <hN>{WORD}</hN> <div class="sectionbody"> ... </div> </div>
Section Attributes
idprefix
":idprefix:" EMPTY / REF_ID
The idprefix must be ASCII string. It must start with "_", "-", or ASCII letters, otherwise the "_" will be prepended. If one of the character is not valid, it will replaced with "_".
idseparator
":idseparator:" EMPTY / "-" / "_" / ALPHA
The idseparator
can be empty or single ASCII character ("_" or "-",
ASCII letter, or digit).
It is used to replace invalid REF_ID character.
Comments
COMMENT_SINGLE = "//" LINE COMMENT_BLOCK = "////" LF *LINE "////" LF COMMENTS = *(COMMENT_SINGLE / COMMENT_BLOCK)
The comment line cannot start with spaces, due to Block literal.
Block listing
LISTING_STYLE = "[listing]" LF TEXT LF LISTING_BLOCK = "----" LF TEXT "----" LF
Block literal
LITERAL_PARAGRAPH = 1*WSP TEXT LITERAL_STYLE = "[literal]" LF TEXT LF LITERAL_BLOCK = "...." LF TEXT "...." LF
HTML format,
<div class="literalblock"> <div class="content"> <pre>{{TEXT}}</pre> </div> </div>
Substitution rules,
-
special characters: "<", ">", and "&"
-
callouts
Include Directive
INCLUDE_DIRECTIVE = "include::" PATH "[" ELEMENT_ATTRIBUTE "]" PATH = ABSOLUTE_PATH / RELATIVE_PATH ABSOLUTE_PATH = "/" WORD *( "/" WORD ) RELATIVE_PATH = ( "." / ".." ) "/" WORD * ( "/" WORD )
Images
Block image
BLOCK_IMAGE = "image::" URL "[" IMAGE_ATTRS "]" IMAGE_ATTRS = TEXT ("," IMAGE_WIDTH ("," IMAGE_HEIGHT)) *("," IMAGE_OPTS) IMAGE_OPTS = IMAGE_OPT_KEY "=" 1*VCHAR IMAGE_OPT_KEY = "title" / "float" / "align" / "role" / "link"
Inline image
IMAGE_INLINE = "image:" URL "[" (IMAGE_ATTRS) "]"
Video
BLOCK_VIDEO = "video::" (URL / WORD) "[" ( "youtube" / "vimeo" ) *(BLOCK_ATTR) "]"
Audio
BLOCK_AUDIO = "audio::" (URL / WORD) "[" ( "options" "=" DQUOTE *AUDIO_ATTR_OPTIONS DQUOTE ) "]" AUDIO_ATTR_OPTIONS = "autoplay" | "loop" | "controls" | "nocontrols"
Block attributes
BLOCK_ATTRS = BLOCK_ATTR *( "," BLOCK_ATTR ) BLOCK_ATTR = WORD "=" (DQUOTE) WORD (DQUOTE)
Inline formatting
There are two types of inline formatting: constrained and unconstrained. The constrained formatting only applicable if the previous character of syntax begin with non-alphanumeric and end with characters other than alpha-numeric and underscore.
FORMAT_BEGIN = WSP / "!" / DQUOTE / "#" / "$" / "%" / "&" / "'" / "(" / ")" / "*" / "+" / "," / "-" / "." / "/" / / ":" / ";" / "<" / "=" / ">" / "?" / "@" / "[" / "\" / "]" / "^" / "_" / "`" / "{" / "|" / "}" / "~" FORMAT_END = FORMAT_BEGIN
Unconstrained bold
TEXT_UNCONSTRAINED_BOLD = "**" TEXT "**"
Unconstrained italic
TEXT_UNCONSTRAINED_ITALIC = "__" TEXT "__"
Unconstrained mono
TEXT_UNCONSTRAINED_MONO = "``" TEXT "``"
Bold
TEXT_BOLD = FORMAT_BEGIN "*" TEXT "*" FORMAT_END
Italic
TEXT_ITALIC = FORMAT_BEGIN "_" TEXT "_" FORMAT_END
Monospace
TEXT_MONO = FORMAT_BEGIN "`" TEXT "`" FORMAT_END
Double quote curve
TEXT_QUOTE_DOUBLE = QUOTE "`" TEXT "`" QUOTE
Single quote curve
TEXT_QUOTE_SINGLE = "'`" TEXT "`'"
Subscript
TEXT_SUBSCRIPT = "~" WORD "~"
Superscript
TEXT_SUPERSCRIPT = "^" WORD "^"
Attribute reference
ATTR_REF = "{" META_KEY "}"
The attribute reference will be replace with document attributes, if its exist, otherwise it would be considered as normal text.
Passthrough
PASSTHROUGH_SINGLE = FORMAT_BEGIN "+" TEXT "+" FORMAT_END PASSTHROUGH_DOUBLE = "++" TEXT "++" PASSTHROUGH_TRIPLE = "+++" TEXT "+++" PASSTHROUGH_BLOCK = "++++" LF 1*LINE "++++" LF PASSTHROUGH_MACRO = "pass:" *(PASSMACRO_SUB) "[" TEXT "]" PASSMACRO_SUB = PASSMACRO_CHAR *("," PASSMACRO_CHAR) PASSMACRO_CHAR = "c" / "q" / "a" / "r" / "m" / "p" / PASSMACRO_GROUP_NORMAL / PASSMACRO_GROUP_VERBATIM PASSMACRO_GROUP_NORMAL = "n" ; equal to "c,q,r,m,p" PASSMACRO_GROUP_VERBATIM = "v" ; equal to "c"
The "c" allow special character substitutions.
The "q" allow quotes substitutions.
The "a" allow attributes references substitutions.
The "r" allow character replacement substitutions.
The "m" allow macro substitutions.
The "p" allow post-replacement substitutions.
The substitutions are applied in above order.
URLs
The URL should end with "[]".
URL = URL_SCHEME "://" 1*VCHAR ( "[" URL_TEXT ("," URL_ATTR_TARGET ) ("," URL_ATTR_ROLE ) "]" ) LWSP URL_TEXT = TEXT ("^") URL_ATTR_TARGET = "window" "=" "_blank" URL_ATTR_RILE = "role=" WORD *("," WORD)
Anchor
ANCHOR_LINE = "[[" REF_ID "]]" LF ANCHOR_LINE_SHORT = "[#" REF_ID "]" LF ANCHOR_INLINE = "[[" REF_ID "]]" TEXT ANCHOR_INLINE_SHORT = "[#" REF_ID "]#" TEXT "#" FORMAT_END.
Cross references
CROSS_REF_INTERNAL = "<<" REF_ID ("," REF_LABEL) / CROSS_REF_NATURAL ">>" CROSS_REF_NATURAL = BLOCK_TITLE
Rendered HTML,
<a href="#REF_ID">REF_LABEL / BLOCK_TITLE</a>
The CROSS_REF_NATURAL only works if the text contains at least one uppercase or space.
Table
TABLE = TABLE_SEP LF *ROW LF TABLE_SEP TABLE_SEP = "|" 3*"=" ROW = 1*CELL CELL = CELL_FORMAT "|" TEXT (LF) CELL_FORMAT = CELL_DUP / CELL_SPAN_COL/ CELL_SPAN_ROW / CELL_ALIGN_HOR / CELL_ALIGN_VER / CELL_STYLE CELL_DUP = 1*DIGIT "*" CELL_SPAN_COL = 1*DIGIT "+" CELL_SPAN_ROW = "." 1*DIGIT "+" CELL_ALIGN_HOR = "<" / "^" / ">" CELL_ALIGN_VER = "." ("<" / "^" / ">") CELL_STYLE = "a" / "d" / "e" / "h" / "l" / "m" / "s" / "v"
Footnote
Syntax,
"footnote:" [ REF_ID ] "[" STRING "]"
In asciidoctor, footnote can be placed anywhere, even after WORD without space in between.
The REF_ID, define the unique ID for footnote and can be used to reference the previous footnote. The first footnote with REF_ID, should have the STRING defined. The next footnote with the same REF_ID, should not have the STRING defined; if its defined, the STRING is ignored.
Inconsistencies and bugs in asciidoctor
Listing style "[listing]" followed by "…." is become listing block. Example,
[listing] .... This block become listing. ....
Image width and height with non-digits characters are allowed, Example,
image::sunset.jpg[Text,a,b]
Link with "https" end with '.' works, but "mailto" end with '.' is not working. Example,
https://asciidoctor.org. mailto:me@example.com.
Block image with "link" option does not work as expected,
image::{image-sunset}[Block image with attribute ref, link={test-url}].
First table row with multiple lines does not considered as header, even thought it separated by empty line. Example,
|=== |A1 |B1 |A2 |B2 |===