TOML (Tom’s Obvious, Minimal Language) is a configuration file format that is designed to be easy to read and write, with a focus on simplicity and clarity. It is widely used in applications where human-readable configurations are needed. In this blog, we will dive deep into TOML strings, their syntax, and how they can be used effectively.
TOML Strings
In TOML, strings are sequences of characters enclosed in quotes. They are used to represent textual data, such as names, URLs, file paths, or any other information that needs to be stored as plain text.
TOML supports the following types of strings:
- Basic Strings
- Multiline Basic Strings
- Literal Strings
- Multiline Literal Strings
Each of these string types serves a specific purpose and offers flexibility in handling text data. Let’s explore each type with examples and detailed explanations.
Basic Strings
Basic strings in TOML are enclosed in double quotes ("
). These are ideal for simple textual data and support escape sequences.
Basic strings in TOML are enclosed in double quotes ("
). These are ideal for simple textual data and support escape sequences for special characters, including:
Unicode characters: You can use almost any character from any language, including symbols and letters.
Escaped characters: Some characters have special meanings and cannot be used directly in a string. These include:
- The quotation mark (
"
) - The backslash (
\
)
Control characters: These are invisible characters such as newlines, tabs, and others in the range U+0000 to U+001F, U+007F (e.g., null, backspace).
If you need to include any of these characters in your string, you’ll need to escape them using a backslash (\
). For example:
\"
for a quotation mark.\\
for a backslash.
Syntax
key = "This is a basic string."
Features
- Escape sequences like
\n
,\t
, and\"
are supported. - Suitable for most common cases where special characters need to be encoded.
str = "I'm a string. \"You can quote me\". Name\tJos\u00E9\nLocation\tSF."
This given string would look like this when processed (it’s just for the sake of understanding).
I'm a string. "You can quote me". Name José
Location SF.
Here’s a simple explanation of the most popular escape sequences and how they work:
\b
– Backspace (U+0008)
Removes the previous character, often used for corrections.\t
– Tab (U+0009)
Adds a horizontal tab, useful for aligning text or creating indents.\n
– Linefeed (U+000A)
Moves the text to a new line, commonly used to break lines in a string.\f
– Form Feed (U+000C)
Used for page breaks in older printers or systems but is rarely seen today.\r
– Carriage Return (U+000D)
Moves the cursor to the beginning of the line, often used in combination with\n
for new lines on Windows (\r\n
).\"
– Quote (U+0022)
Allows you to include double quotes ("
) inside a string without ending it.\\
– Backslash (U+005C)
Lets you include a backslash in the string.\uXXXX
– Unicode (U+XXXX)
Represents a Unicode character using 4 hexadecimal digits. For example,\u00E9
is the Unicode for “é.”\UXXXXXXXX
– Unicode (U+XXXXXXXX)
Represents a Unicode character using 8 hexadecimal digits, suitable for higher Unicode scalar values.
Key Notes:
- Any escape sequences not listed above are reserved, and using them should result in an error in TOML.
- Any Unicode character can be escaped using the
\uXXXX
or\UXXXXXXXX
forms, as long as the code is a valid Unicode scalar value. For example,\u0041
represents “A”.
By using these escape sequences, you can handle formatting, symbols, and special characters more effectively in TOML strings, keeping your configuration files both clear and versatile.
Multiline Basic Strings
TOML makes it convenient to work with long passages of text or multi-line strings by using multi-line basic strings. These strings are enclosed with three double quotation marks ("""
) on each side. Here’s how it works:
Allowing Newlines:Multi-line basic strings allow text to span multiple lines, making them ideal for storing long passages (like translations or formatted text).
str1 = """
Roses are red
Violets are blue"""
Trimming Leading Newline:
If there’s a newline immediately after the opening """
, it will be trimmed, ensuring the text starts neatly.
Whitespace and Newline Handling:
Any other whitespace or newline characters within the text are preserved, allowing for proper formatting.
Platform-Specific Newline Normalization:
TOML parsers may normalize newlines (\n
for Unix-like systems or \r\n
for Windows) depending on the platform.
# On a Unix system, the above multi-line string will most likely be the same as:
str2 = "Roses are red\nViolets are blue"
# On a Windows system, it will most likely be equivalent to:
str3 = "Roses are red\r\nViolets are blue"
Line Ending Backslash (\
):
If you want to write long strings without adding extra spaces or newlines, you can use a line ending backslash (\
). When a line ends with an unescaped \
, it removes any following spaces, newlines, or tabs until the next visible character or closing delimiter.
The following strings are identical in value,
# The following strings are byte-for-byte equivalent:
str1 = "The quick brown fox jumps over the lazy dog."
str2 = """
The quick brown \
fox jumps over \
the lazy dog."""
str3 = """\
The quick brown \
fox jumps over \
the lazy dog.\
"""
So, what does the rule say?
Any Unicode character is allowed, except those that require escaping, such as the backslash (\
) itself and certain control characters other than tab, line feed, and carriage return (U+0000 to U+0008, U+000B, U+000C, U+000E to U+001F, U+007F).
There is another rule about quotation marks: You can use either single or double quotation marks anywhere inside a multi-line string.
Let’s explore this in depth with examples and explanations.
How Quotation Marks Work in Multi-Line Basic Strings
In TOML, multi-line basic strings use triple double quotes ("""
) to allow text to span multiple lines. However, using multiple consecutive quotation marks ("
) inside them can be tricky because it may conflict with the string delimiters. Let’s go through each example carefully.
Valid Example
str = """Here are two quotation marks: "". Simple enough."""
Here,
- The string is enclosed in triple double quotes (
"""
), so it’s a valid multi-line string. - Inside the string,
""
(two quotation marks) appear normally without causing any issues.
Invalid Example (incorrect syntax)
# str = """Here are three quotation marks: """."""
Why is this invalid?
- The first triple double quote (
"""
) starts the string. - The next three quotes (
"""
) end the string too early. - Then, TOML sees
.""""
as invalid syntax outside the string.
Solution: Use an escape sequence to avoid closing the string unintentionally.
Corrected Example (escaping quotes)
str = """Here are three quotation marks: ""\"."""
Here,
- The first two quotes
""
appear normally. - The third quote
\"
is escaped, preventing the string from closing too early. - The final
"""
properly ends the string.
Including Many Quotes (Escaping for Clarity)
str = """Here are fifteen quotation marks: ""\"""\"""\"""\"""\"."""
Here,
- Inside the string, we want to include
"""""""""""""
(15 double quotes). - To avoid accidentally ending the string, we escape every third quotation mark (
\"
). - This allows TOML to correctly parse the string while still displaying 15 quotes inside it.
Quotes Inside a Sentence
str = """"This," she said, "is just a pointless statement.""""
As you see here,
- The string starts and ends with triple double quotes (
"""
), which allow multi-line text. - The inner text contains regular double quotes (
"This,"
and"is just a pointless statement."
), which do not conflict with the string delimiters. - Since the first character inside the string is a quotation mark (
"
), it does not close the string because TOML allows this.
Alternative: Using Literal Strings to Avoid Escaping
If escaping feels complicated, TOML also supports literal strings, which use triple single quotes ('''
). These do not process escape sequences, meaning you can write anything inside without worrying about backslashes or quote conflicts.
str_literal = '''Here are three quotation marks: """.'''
Why is this easier?
- No need to escape quotes (
\
)—everything is taken as is. - Ideal for Windows paths, regex patterns, or text with many quotes.
Literal Strings
Single-Line Literal Strings
- Surrounded by single quotes (
'
). - No escaping (
\
) is allowed—everything inside is taken as it is. - Must be on a single line (no multi-line support).
# What you type is exactly what you get.
winpath = 'C:\Users\nodejs\templates'
winpath2 = '\\ServerX\admin$\system32\'
quoted = 'Amol "J" Pawar'
regex = '<\i\c*\s*>'
No need to escape backslashes (\
) or quotes ("
), making it useful for file paths and regex patterns.
Though it looks impressive, there is one limitation. Let’s see what it is and find a solution for it.
Limitation: Single Quote ('
) Cannot Be Used Inside
Since escaping is not allowed, you cannot include a single quote ('
) inside a single-quoted string.
This is NOT possible:
bad_example = 'I'm using TOML' # INVALID because of the `'` in "I'm"
The workaround for this is to use multi-line literal strings. Let’s see it in detail.
Multi-Line Literal Strings ('''
)
- Surrounded by triple single quotes (
'''
). - No escaping is needed, and backslashes (
\
) are taken as it is. - Supports multiple lines while preserving whitespace and newlines.
- The first newline after
'''
is trimmed (ignored).
regex2 = '''I [dw]on't need \d{2} apples'''
lines = '''
The first newline is
trimmed in raw strings.
All other whitespace
is preserved.
'''
Again, why is this useful?
- It allows single quotes (
'
) inside the string, unlike single-line literal strings. - Also, whitespace and newlines are preserved exactly as they appear.
Limitation: Three Consecutive Single Quotes ('''
) Are Not Allowed
A sequence of three or more single quotes ('''
) inside a multi-line string is not permitted because it would conflict with the string delimiters.
This is NOT valid:
# INVALID because `''''''''''''''` is inside a multi-line literal string
apos15 = '''Here are fifteen apostrophes: ''''''''''''''''''
Workaround: Use a double-quoted basic string ("""
), which allows escaping:
apos15 = "Here are fifteen apostrophes: '''''''''''''''"
Handling Quotes in Multi-Line Strings
You can include one or two single quotes ('
), but not three or more in a row.
quot15 = '''Here are fifteen quotation marks: """""""""""""""'''
str = ''''That,' she said, 'is still pointless.''''
The first and last quote ('
) are just part of the string—they don’t interfere.
Control Characters Are Not Allowed
TOML does not allow control characters (except for tab).
- This means you can’t store binary data directly in a literal string.
- Use Base64 or another encoding if needed.
Conclusion
TOML strings offer a flexible and straightforward way to manage textual data in configuration files. By understanding the different types of strings and their use cases, you can write clean and maintainable TOML files that suit your application’s needs. With this guide, you now have the knowledge to effectively utilize TOML strings in your projects.