Tracks
/
jq
jq
/
Syllabus
/
Regular Expressions
Re

Regular Expressions in jq

1 exercise

About Regular Expressions

Regular expressions (regexes) are sequences of characters that specify a search pattern in text.

Learning regular expression syntax is beyond the scope of this topic. We will focus on the expressions that jq provides to utilize regexes.

Regex Flavour

Different tools implement different versions of regular expressions. jq incorporates the Oniguruma regex library that is largely compatible with Perl v5.8 regexes.

The specific syntax used by jq version 1.7 can be found on the Oniguruma GitHub repo.

Caution

jq does not have any special syntax for regular expressions. They are simply expressed as strings. That means that any backslashes in the regular expression need to be escaped in the string.

For example, the digit character class (\d) must be written as "\\d".

Regex Functions

Regular expressions in jq are limited to a set of filters.

Simple Matching

When you need to know if a string matches a pattern, use the test filter.

STRING | test(REGEX)
STRING | test(REGEX; FLAGS)
STRING | test([REGEX, FLAGS])

This filter outputs a boolean result.

"Hello World!" | test("W")    # => true
"Goodbye Mars" | test("W")    # => false

Information About the Match

When you need to extract the substring that actually matched the pattern, use the match filter.

STRING | match(REGEX)
STRING | match(REGEX; FLAGS)
STRING | match([REGEX, FLAGS])

This filter outputs:

  • nothing if there was no match, or
  • an object containing various properties if there was a match.

This example looks for two identical consecutive vowels by using the backref syntax, \1.

"Hello World!" | match("([aeiou])\\1")
# => empty

"Goodbye Mars" | match("([aeiou])\\1")
# => {
#      "offset": 1,
#      "length": 2,
#      "string": "oo",
#      "captures": [
#        {
#          "offset": 1,
#          "length": 1,
#          "string": "o",
#          "name": null
#        }
#      ]
#    }

The match filter returns an object for each match. This example shows the "g" flag in action to find all the vowels.

"Goodbye Mars" | match("[aeiou]"; "g")
# => { "offset": 1, "length": 1, "string": "o", "captures": [] }
#    { "offset": 2, "length": 1, "string": "o", "captures": [] }
#    { "offset": 6, "length": 1, "string": "e", "captures": [] }
#    { "offset": 9, "length": 1, "string": "a", "captures": [] }

Captured Substrings

Similar to the match filter, the capture filter returns an object if there was a match.

STRING | capture(REGEX)
STRING | capture(REGEX; FLAGS)
STRING | capture([REGEX, FLAGS])

The returned object is a mapping of the named captures.

"JIRAISSUE-1234" | capture("(?<project>\\w+)-(?<issue_num>\\d+)")
# => {
#      "project": "JIRAISSUE",
#      "issue_num": "1234"
#    }

Just the Substrings

The scan filter is similar to match with the "g" flag.

STRING | scan(REGEX)
STRING | scan(REGEX; FLAGS)

# note, there is no scan([REGEX, FLAGS]) version, unlike other filters

scan will output a stream of substrings.

"Goodbye Mars" | scan("[aeiou]")
# => "o"
#    "o"
#    "e"
#    "a"

Use the [...] array constructor to capture the substrings.

"Goodbye Mars" | [ scan("[aeiou]") ]
# => ["o", "o", "e", "a"]
Note

Note that jq v1.6 does not implement the 2-argument scan function, even though the version 1.6 manual says it does:

Splitting a String

If you know the parts of the string you want to keep, use match or scan. If you know the parts that you want to discard, use split.

STRING | split(REGEX; FLAGS)
Caution

The 1-arity split filter treats its argument as a fixed string.

To use a regex with split, you must provide the 2nd argument; it's OK to use an empty string.

An example that splits a string on arbitrary whitespace.

"first   second           third fourth" | split("\\s+"; "")
# => ["first", "second", "third", "fourth"]
Note

This is what happens if we forget the flags argument.

"first   second           third fourth" | split("\\s+")
# => ["first   second           third fourth"]

Only one result: the fixed string \s+ was not seen in the input.

The 1-arity split cannot handle arbitrary whitespace. Splitting on a space gives this result.

"first   second           third fourth" | split(" ")
# => ["first", "", "", "second", "", "", "", "",
#     "", "", "", "", "", "", "third", "fourth" ]

Substitutions

The sub and gsub filters can transform the input string, replacing matched portions of the input with a replacement string.

To replace just the first match, use sub. To replace all the matches, use gsub.

STRING | sub(REGEX; REPLACEMENT)
STRING | sub(REGEX; REPLACEMENT; FLAGS)
STRING | gsub(REGEX; REPLACEMENT)
STRING | gsub(REGEX; REPLACEMENT; FLAGS)
"Goodnight kittens. Goodnight mittens." | sub("night"; " morning")
# => "Good morning kittens. Goodnight mittens."

"Goodnight kittens. Goodnight mittens." | gsub("night"; " morning")
# => "Good morning kittens. Good morning mittens."

The replacement text can refer to the matched substrings; use named captures and string interpolation.

"Some 3-letter acronyms: gnu, csv, png"
| gsub( "\\b(?<tla>[[:alpha:]]{3})\\b";     # find words 3 letters long
        "\(.tla | ascii_upcase)" )          # upper-case the match
# => "Some 3-letter acronyms: GNU, CSV, PNG"

Flags

In all the above filters, FLAGS is a string consisting of zero of more of the supported flags.

  • g - Global search (find all matches, not just the first)
  • i - Case insensitive search
  • m - Multi line mode ('.' will match newlines)
  • n - Ignore empty matches
  • p - Both s and m modes are enabled
  • s - Single line mode ('^' -> '\A', '$' -> '\Z')
  • l - Find longest possible matches
  • x - Extended regex format (ignore whitespace and comments)

For example

"JIRAISSUE-1234" | capture("(?<project>\\w+)-(?<issue_num>\\d+)")

# or with Extended formatting

"JIRAISSUE-1234" | capture("
                     (?<project>   \\w+ )  # the Jira project
                     -                     # followed by a hyphen
                     (?<issue_num> \\d+ )  # followed by digits
                   "; "x")
Edit via GitHub The link opens in a new window or tab

Learn Regular Expressions

Practicing is locked

Unlock 1 more exercise to practice Regular Expressions