kilabit.info
Build | sr.ht | GitHub | Twitter

This documentation provide summary and notes on implementation of Internet Message Format as defined in RFC 5322.

1. Syntax

message         =   (fields / obs-fields)
                    [CRLF body]

fields          =   *(field-name ":" (field-body / unstructured) CRLF)

field-name      =   1*ftext

field-body      =   (*([FWS] VCHAR) *WSP)

unstructured    =   (*([FWS] VCHAR) *WSP) / obs-unstruct

VCHAR           =   %d33-126

WSP             =   %d9 / %d32
                ; tab or space

ftext           =   %d33-57 / %d59-126
                ; Printable US-ASCII, except %d0-32 and %d58 (":")

body            =   (*(*998text CRLF) *998text) / obs-body

text            =   %d1-9 /            ; Characters excluding CR
                    %d11 /             ;  and LF
                    %d12 /
                    %d14-127
  • Each line MUST be no more than 998 characters, excluding CRLF.

  • Each line SHOULD be no more than 78 characters, excluding the CRLF.

  • CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body.

  • Each header field SHOULD be treated in its unfolded form for further syntactic and semantic evaluation.

  • "field-body" MUST NOT include CR and LF except when used in "folding" and "unfolding".

1.1. Folding White Space and Comments

CFWS            =   (1*([FWS] comment) [FWS]) / FWS

FWS             =   ([*WSP CRLF] 1*WSP) / obs-FWS
                ; Folding white space

comment         =   "(" *([FWS] ccontent) [FWS] ")"

ccontent        =   ctext / quoted-pair / comment

ctext           =   %d33-39 /          ; Printable US-ASCII
                    %d42-91 /          ;  characters not including
                    %d93-126 /         ;  "(", ")", or "\"
                    obs-ctext

quoted-pair     =   "\" (VCHAR / WSP) / obs-qp

Folding is a function to split a line into multiline with CRLF and WSP. For example, the following line,

"Subject: This is a test" CRLF

can be folded into,

"Subject: This" CRLF
WSP "is a test" CRLF

Unfolding is the process that reverse the output of folding into original input.

  • An unfolded header field has no length restriction and therefore may be indeterminately long.

  • Any CRLF that appears in FWS is semantically "invisible".

  • The "" in any quoted-pair is semantically "invisible".

  • Folding is permitted within the comment.

  • The parentheses and backslash characters may appear in a comment, so long as they appear as a quoted-pair.

  • Comment is not including the enclosing paretheses.

1.2. Atom

phrase          =   1*word / obs-phrase

word            =   atom / quoted-string

atom            =   [CFWS] 1*atext [CFWS]

dot-atom        =   [CFWS] dot-atom-text [CFWS]

dot-atom-text   =   1*atext *("." 1*atext)

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                    "!" / "#" /        ;  characters not including
                    "$" / "%" /        ;  specials.  Used for atoms.
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"

specials        =   "(" / ")" /        ; Special characters that do
                    "<" / ">" /        ;  not appear in atext
                    "[" / "]" /
                    ":" / ";" /
                    "@" / "\" /
                    "," / "." /
                    DQUOTE
  • The optional comments and FWS surrounding the rest of the characters are not part of the atom.

1.3. Quoted Strings

quoted-string   =   [CFWS]
                    DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                    [CFWS]

qcontent        =   qtext / quoted-pair

qtext           =   %d33 /             ; Printable US-ASCII
                    %d35-91 /          ;  characters not including
                    %d93-126 /         ;  "\" or the quote character
                    obs-qtext

1.4. Date and Time Specification

Syntax,

date-time       =   [ day-of-week "," ] date time [CFWS]

day-of-week     =   ([FWS] day-name) / obs-day-of-week

day-name        =   "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" / "Sun"

date            =   day month year

day             =   ([FWS] 1*2DIGIT FWS) / obs-day

month           =   "Jan" / "Feb" / "Mar" / "Apr" /
                    "May" / "Jun" / "Jul" / "Aug" /
                    "Sep" / "Oct" / "Nov" / "Dec"

year            =   (FWS 4*DIGIT FWS) / obs-year

time            =   time-of-day zone

time-of-day     =   hour ":" minute [ ":" second ]

hour            =   2DIGIT / obs-hour

minute          =   2DIGIT / obs-minute

second          =   2DIGIT / obs-second

zone            =   (FWS ( "+" / "-" ) 4DIGIT) / obs-zone
  • The date and time-of-day SHOULD express local time.

  • The form "+0000" on zone SHOULD be used to indicate a time zone at Universal Time.

  • The form "-0000" on zone indicate that the time was generated on a system that may be in a local time zone other than Universal Time and that the date-time contains no information about the local time zone.

  • A date-time specification MUST be semantically valid.

  • The day-of-week MUST be the day implied by the date.

  • The numeric day-of-month MUST be between 1 and the number of days allowed for the specified month (in the specified year).

  • The time-of-day MUST be in the range 00:00:00 through 23:59:60 (the number of seconds allowing for a leap second.

  • The last two digits of the zone MUST be within the range 00 through 59.

1.5. Address Specification

An address may either be an individual mailbox, or a group of mailboxes.

Format,

address-list    =   (address *("," address)) / obs-addr-list

address         =   mailbox / group

group           =   display-name ":" [group-list] ";" [CFWS]

group-list      =   mailbox-list / CFWS / obs-group-list

mailbox-list    =   (mailbox *("," mailbox)) / obs-mbox-list

address         =   mailbox / group

mailbox         =   name-addr / addr-spec

name-addr       =   [display-name] angle-addr

angle-addr      =   [CFWS] "<" addr-spec ">" [CFWS] /
                    obs-angle-addr

display-name    =   phrase

addr-spec       =   local-part "@" domain

local-part      =   dot-atom / quoted-string / obs-local-part

domain          =   dot-atom / domain-literal / obs-domain

domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]

dtext           =   %d33-90 /          ; Printable US-ASCII
                    %d94-126 /         ;  characters not including
                    obs-dtext          ;  "[", "]", or "\"
  • dot-atom form SHOULD be used,

  • quoted-string form SHOULD NOT be used;

  • Comments and folding white space SHOULD NOT be used around the "@" in the addr-spec.

Format,

fields          =   *(trace
                      *optional-field /
                      *(resent-date /
                       resent-from /
                       resent-sender /
                       resent-to /
                       resent-cc /
                       resent-bcc /
                       resent-msg-id))
                    *(orig-date /
                    from /
                    sender /
                    reply-to /
                    to /
                    cc /
                    bcc /
                    message-id /
                    in-reply-to /
                    references /
                    subject /
                    comments /
                    keywords /
                    optional-field)
Field Min number Max number Notes

trace

0

unlimited

Block prepended - see 3.6.7

resent-date

0*

unlimited*

One per block, required if other resent fields are present - see 3.6.6

resent-from

0

unlimited*

One per block - see 3.6.6

resent-sender

0*

unlimited*

One per block, MUST occur with multi-address resent-from - see 3.6.6

resent-to

0

unlimited*

One per block - see 3.6.6

resent-cc

0

unlimited*

One per block - see 3.6.6

resent-bcc

0

unlimited*

One per block - see 3.6.6

resent-msg-id

0

unlimited*

One per block - see 3.6.6

orig-date

1

1

from

1

1

See sender and 3.6.2

sender

0*

1

MUST occur withmulti-address from - see 3.6.2

reply-to

0

1

to

0

1

cc

0

1

bcc

0

1

message-id

0*

1

SHOULD be present - see 3.6.4

in-reply-to

0*

1

SHOULD occur in some replies - see 3.6.4

references

0*

1

SHOULD occur in some replies - see 3.6.4

subject

0

1

comments

0

unlimited

keywords

0

unlimited

optional-field

0

unlimited

  • Header fields SHOULD NOT be reordered when a message is transported or transformed.

  • The trace header fields and resent header fields MUST NOT be reordered, and SHOULD be kept in blocks prepended to the message.

  • The only required header fields are the "Date" field and the originator address field(s) (which is "From", "Sender", and "Reply-To").

2.1. Date Field

The date and time at which the creator of the message indicated that the message was completed, not the time the message transferred.

orig-date       =   "Date:" date-time CRLF

2.2. Originator Fields

from            =   "From:" mailbox-list CRLF

sender          =   "Sender:" mailbox CRLF

reply-to        =   "Reply-To:" address-list CRLF
  • If the "From:" field contains more than one mailbox, then the sender field MUST appear in the message.

  • If the originator of the message can be indicated by a single mailbox and the author and transmitter are identical, the "Sender:" field SHOULD NOT be used. Otherwise, both fields SHOULD appear.

  • When the "Reply-To:" field is present, it indicates the address(es) to which the author of the message suggests that replies be sent.

  • In the absence of the "Reply-To:" field, replies SHOULD by default be sent to the mailbox(es) specified in the "From:" field unless otherwise specified by the person composing the reply.

  • In all cases, the "From:" field SHOULD NOT contain any mailbox that does not belong to the author(s) of the message.

2.3. Destination Fields

to  =   "To:" address-list CRLF

cc  =   "Cc:" address-list CRLF

bcc =   "Bcc:" [address-list / CFWS] CRLF

The "To:" field contains the address(es) of the primary recipient(s) of the message.

The "Cc:" field (where the "Cc" means "Carbon Copy" in the sense of making a copy on a typewriter using carbon paper) contains the addresses of others who are to receive the message, though the content of the message may not be directed at them.

The "Bcc:" field (where the "Bcc" means "Blind Carbon Copy") contains addresses of recipients of the message whose addresses are not to be revealed to other recipients of the message.

There are three ways in which the "Bcc:" field is used,

  1. The "Bcc:" line is removed even though all of the recipients (including those specified in the "Bcc:" field) are sent a copy of the message.

  2. Recipients specified in the "To:" and "Cc:" lines each are sent a copy of the message with the "Bcc:" line removed as above, but the recipients on the "Bcc:" line get a separate copy of the message containing a "Bcc:" line. (When there are multiple recipient addresses in the "Bcc:" field, some implementations actually send a separate copy of the message to each recipient with a "Bcc:" containing only the address of that particular recipient.)

  3. Since a "Bcc:" field may contain no addresses, a "Bcc:" field can be sent without any addresses indicating to the recipients that blind copies were sent to someone.

Which method to use with "Bcc:" fields is implementation dependent, but refer to the "Security Considerations" section of this document for a discussion of each.

2.4. Identification Field

Format,

message-id      =   "Message-ID:" msg-id CRLF

in-reply-to     =   "In-Reply-To:" 1*msg-id CRLF

references      =   "References:" 1*msg-id CRLF

msg-id          =   [CFWS] "<" id-left "@" id-right ">" [CFWS]

id-left         =   dot-atom-text / obs-id-left

id-right        =   dot-atom-text / no-fold-literal / obs-id-right

no-fold-literal =   "[" *dtext "]"
  • Every message SHOULD have a "Message-ID:" field.

  • Reply messages SHOULD have "In-Reply-To:" and "References:" fields.

msg-id is intended to be machine readable and not necessarily meaningful to humans.

A liberal syntax is given for the id-right; however, the use of a domain is RECOMMENDED.

The "In-Reply-To:" and "References:" fields are used when creating a reply to a message. "In-Reply-To:" field may be used to identify the message (or messages) to which the new message is a reply (one or more parent), while the "References:" field may be used to identify a "thread" of conversation.

Trying to form a "References:" field for a reply that has multiple parents is discouraged.

The message identifier (msg-id) itself MUST be a globally unique identifier for a message.

Semantically, the angle bracket characters are not part of the msg-id; the msg-id is what is contained between the two angle bracket characters.

2.5. Informational Fields

subject         =   "Subject:" unstructured CRLF

comments        =   "Comments:" unstructured CRLF

keywords        =   "Keywords:" phrase *("," phrase) CRLF

When used in a reply, the "Subject" body MAY start with the string "Re: " (an abbreviation of the Latin "in re", meaning "in the matter of") followed by the contents of the "Subject:" field body of the original message. If this is done, only one instance of the literal string "Re: " ought to be used since use of other strings or more than one instance can lead to undesirable consequences.

2.6. Resent Fields

Each of the resent fields corresponds to a particular field elsewhere in the syntax.

resent-date     =   "Resent-Date:" date-time CRLF

resent-from     =   "Resent-From:" mailbox-list CRLF

resent-sender   =   "Resent-Sender:" mailbox CRLF

resent-to       =   "Resent-To:" address-list CRLF

resent-cc       =   "Resent-Cc:" address-list CRLF

resent-bcc      =   "Resent-Bcc:" [address-list / CFWS] CRLF

resent-msg-id   =   "Resent-Message-ID:" msg-id CRLF
  • Resent fields SHOULD be added to any message that is reintroduced by a user into the transport system.

  • A separate set of resent fields SHOULD be added each time this is done.

  • All of the resent fields corresponding to a particular resending of the message SHOULD be grouped together.

  • Each new set of resent fields is prepended to the message; that is, the most recent set of resent fields appears earlier in the message.

  • No other fields in the message are changed when resent fields are added.

  • When resent fields are used, the "Resent-From:" and "Resent-Date:" fields MUST be sent.

  • The "Resent-Message-ID:" field SHOULD be sent.

  • "Resent-Sender:" SHOULD NOT be used if "Resent-Sender:" would be identical to "Resent-From:".

  • The "Resent-Message-ID:" field provides a unique identifier for the resent message.

2.7. Trace Fields

trace           =   [return] 1*received

return          =   "Return-Path:" path CRLF

path            =   angle-addr / ([CFWS] "<" [CFWS] ">" [CFWS])

received        =   "Received:" *received-token ";" date-time CRLF

received-token  =   word / angle-addr / addr-spec / domain

2.8. Optional Fields

The field names of any optional field MUST NOT be identical to any field name specified elsewhere in this document.

optional-field  =   field-name ":" unstructured CRLF

3. Obsolete Specification

3.1. Obsolete Date and Time

The syntax for the obsolete date format allows

  1. a 2 digit year in the date field, and

  2. alphabetic time zone specifiers

Where a two or three digit year occurs in a date, the year is to be interpreted as follows:

  1. If a two digit year is encountered whose value is between 00 and 49, the year is interpreted by adding 2000, ending up with a value between 2000 and 2049.

  2. If a two digit year is encountered with a value between 50 and 99, or any three digit year is encountered, the year is interpreted by adding 1900.

Obsolete zones,

EDT is semantically equivalent to -0400
EST is semantically equivalent to -0500
CDT is semantically equivalent to -0500
CST is semantically equivalent to -0600
MDT is semantically equivalent to -0600
MST is semantically equivalent to -0700
PDT is semantically equivalent to -0700
PST is semantically equivalent to -0800

However, because of the error in [RFC0822], any time zones SHOULD all be considered equivalent to "-0000" unless there is out-of-band information confirming their meaning.

3.2. Obsolete Addressing

There are four primary differences in addressing.

  1. mailbox addresses were allowed to have a route portion before the addr-spec when enclosed in "<" and ">". The route is simply a comma-separated list of domain names, each preceded by "@", and the list terminated by a colon.

  2. CFWS were allowed between the period-separated elements of local-part and domain (i.e., dot-atom was not used). In addition, local-part is allowed to contain quoted-string in addition to just atom.

  3. mailbox-list and address-list were allowed to have "null" members. That is, there could be two or more commas in such a list with nothing in between them, or commas at the beginning or end of the list.

  4. US-ASCII control characters and quoted-pairs were allowed in domain literals and are added here.

3.3. Obsolete Header Fields

  • Allows multiple occurrences of any of the fields.

  • Fields may occur in any order.

  • Any amount of white space is allowed before the ":" at the end of the field name.