Regular expressions (regexes) are sequences of characters that specify a search pattern in text.
Learning regular expression syntax is beyond the scope of this topic.
We will focus on the expressions that jq
provides to utilize regexes.
Different tools implement different versions of regular expressions.
jq
incorporates the Oniguruma regex library that is largely compatible with Perl v5.8 regexes.
The specific syntax used by jq
version 1.7 can be found on the Oniguruma GitHub repo.
jq
does not have any special syntax for regular expressions.
They are simply expressed as strings.
That means that any backslashes in the regular expression need to be escaped in the string.
For example, the digit character class (\d
) must be written as "\\d"
.
Regular expressions in jq
are limited to a set of filters.
When you need to know if a string matches a pattern, use the test
filter.
STRING | test(REGEX)
STRING | test(REGEX; FLAGS)
STRING | test([REGEX, FLAGS])
This filter outputs a boolean result.
"Hello World!" | test("W") # => true
"Goodbye Mars" | test("W") # => false
When you need to extract the substring that actually matched the pattern, use the match
filter.
STRING | match(REGEX)
STRING | match(REGEX; FLAGS)
STRING | match([REGEX, FLAGS])
This filter outputs:
This example looks for two identical consecutive vowels by using the backref syntax, \1
.
"Hello World!" | match("([aeiou])\\1")
# => empty
"Goodbye Mars" | match("([aeiou])\\1")
# => {
# "offset": 1,
# "length": 2,
# "string": "oo",
# "captures": [
# {
# "offset": 1,
# "length": 1,
# "string": "o",
# "name": null
# }
# ]
# }
The match
filter returns an object for each match.
This example shows the "g"
flag in action to find all the vowels.
"Goodbye Mars" | match("[aeiou]"; "g")
# => { "offset": 1, "length": 1, "string": "o", "captures": [] }
# { "offset": 2, "length": 1, "string": "o", "captures": [] }
# { "offset": 6, "length": 1, "string": "e", "captures": [] }
# { "offset": 9, "length": 1, "string": "a", "captures": [] }
Similar to the match
filter, the capture
filter returns an object if there was a match.
STRING | capture(REGEX)
STRING | capture(REGEX; FLAGS)
STRING | capture([REGEX, FLAGS])
The returned object is a mapping of the named captures.
"JIRAISSUE-1234" | capture("(?<project>\\w+)-(?<issue_num>\\d+)")
# => {
# "project": "JIRAISSUE",
# "issue_num": "1234"
# }
The scan
filter is similar to match
with the "g"
flag.
STRING | scan(REGEX)
STRING | scan(REGEX; FLAGS)
# note, there is no scan([REGEX, FLAGS]) version, unlike other filters
scan
will output a stream of substrings.
"Goodbye Mars" | scan("[aeiou]")
# => "o"
# "o"
# "e"
# "a"
Use the [...]
array constructor to capture the substrings.
"Goodbye Mars" | [ scan("[aeiou]") ]
# => ["o", "o", "e", "a"]
Note that jq v1.6 does not implement the 2-argument scan
function, even though the version 1.6 manual says it does:
If you know the parts of the string you want to keep, use match
or scan
.
If you know the parts that you want to discard, use split
.
STRING | split(REGEX; FLAGS)
The 1-arity split
filter treats its argument as a fixed string.
To use a regex with split
, you must provide the 2nd argument; it's OK to use an empty string.
An example that splits a string on arbitrary whitespace.
"first second third fourth" | split("\\s+"; "")
# => ["first", "second", "third", "fourth"]
This is what happens if we forget the flags argument.
"first second third fourth" | split("\\s+")
# => ["first second third fourth"]
Only one result: the fixed string \s+
was not seen in the input.
The 1-arity split
cannot handle arbitrary whitespace.
Splitting on a space gives this result.
"first second third fourth" | split(" ")
# => ["first", "", "", "second", "", "", "", "",
# "", "", "", "", "", "", "third", "fourth" ]
The sub
and gsub
filters can transform the input string, replacing matched portions of the input with a replacement string.
To replace just the first match, use sub
.
To replace all the matches, use gsub
.
STRING | sub(REGEX; REPLACEMENT)
STRING | sub(REGEX; REPLACEMENT; FLAGS)
STRING | gsub(REGEX; REPLACEMENT)
STRING | gsub(REGEX; REPLACEMENT; FLAGS)
"Goodnight kittens. Goodnight mittens." | sub("night"; " morning")
# => "Good morning kittens. Goodnight mittens."
"Goodnight kittens. Goodnight mittens." | gsub("night"; " morning")
# => "Good morning kittens. Good morning mittens."
The replacement text can refer to the matched substrings; use named captures and string interpolation.
"Some 3-letter acronyms: gnu, csv, png"
| gsub( "\\b(?<tla>[[:alpha:]]{3})\\b"; # find words 3 letters long
"\(.tla | ascii_upcase)" ) # upper-case the match
# => "Some 3-letter acronyms: GNU, CSV, PNG"
In all the above filters, FLAGS is a string consisting of zero of more of the supported flags.
g
- Global search (find all matches, not just the first)i
- Case insensitive searchm
- Multi line mode ('.' will match newlines)n
- Ignore empty matchesp
- Both s and m modes are enableds
- Single line mode ('^' -> '\A', '$' -> '\Z')l
- Find longest possible matchesx
- Extended regex format (ignore whitespace and comments)For example
"JIRAISSUE-1234" | capture("(?<project>\\w+)-(?<issue_num>\\d+)")
# or with Extended formatting
"JIRAISSUE-1234" | capture("
(?<project> \\w+ ) # the Jira project
- # followed by a hyphen
(?<issue_num> \\d+ ) # followed by digits
"; "x")