Regular expressions (regex) are a powerful tool for working with strings in Elixir. Regular expressions in Elixir follow the PCRE specification (Perl Compatible Regular Expressions). String patterns representing the regular expression's meaning are first compiled then used for matching all or part of a string.
In Elixir, the most common way to create regular expressions is using the ~r
sigil. Sigils provide syntactic sugar shortcuts for common tasks in Elixir. To match a string literal, we can use the string itself as a pattern following the sigil.
~r/test/
The =~/2
operator is useful to perform a regex match on a string to return a boolean
result.
"this is a test" =~ ~r/test/
# => true
Two notes about using sigils:
/
\
)Matching a range of characters using square brackets []
defines a character class. This will match any one character to the characters in the class. You can also specify a range of characters like a-z
, as long as the start and end represent a contiguous range of code points.
regex = ~r/[a-z][ADKZ][0-9][!?]/
"jZ5!" =~ regex
# => true
"jB5?" =~ regex
# => false
Shorthand character classes make the pattern more concise. For example:
\d
short for [0-9]
(any digit)\w
short for [A-Za-z0-9_]
(any 'word' character)\s
short for [ \t\r\n\f]
(any whitespace character)When a shorthand character class is used outside of a sigil, it must be escaped: "\\d"
Alternations use |
as a special character to denote matching one or another
regex = ~r/cat|bat/
"bat" =~ regex
# => true
"cat" =~ regex
# => true
Quantifiers allow for a repeating pattern in the regex. They affect the group preceding the quantifier.
{N, M}
where N
is the minimum number of repetitions, and M
is the maximum{N,}
match N
or more repetitions
{0,}
may also be written as *
: match zero-or-more repetitions{1,}
may also be written as +
: match one-or-more repetitions{,N}
match up to N
repetitionsRound brackets ()
are used to denote groups and captures. The group may also be captured in some instances to be returned for use. In Elixir, these may be named or un-named. Captures are named by appending ?<name>
after the opening parenthesis. Groups function as a single unit, like when followed by quantifiers.
regex = ~r/(h)at/
Regex.replace(regex, "hat", "\\1op")
# => "hop"
regex = ~r/(?<letter_b>b)/
Regex.scan(regex, "blueberry", capture: :all_names)
# => [["b"], ["b"]]
Anchors are used to tie the regular expression to the beginning or end of the string to be matched:
^
anchors to the beginning of the string$
anchors to the end of the stringBecause the ~r
is a shortcut for "pattern" |> Regex.escape() |> Regex.compile!()
, you may also use string interpolation to dynamically build a regular expression pattern:
anchor = "$"
regex = ~r/end of the line#{anchor}/
"end of the line?" =~ regex
# => false
"end of the line" =~ regex
# => true
You have been tasked to write a service which ingests events. Each event has a date associated with it, but you notice that 3 different formats are being submitted to your service's endpoint:
"01/01/1970"
"January 1, 1970"
"Thursday, January 1, 1970"
You can see there are some similarities between each of them, and decide to write some composable regular expression patterns.
Implement day/0
, month/0
, and year/0
to return a string pattern which, when compiled, would match the numeric components in "01/01/1970"
(dd/mm/yyyy
). The day and month may appear as 1
or 01
(left padded with zeroes).
"31" =~ DateParser.day() |> Regex.compile!()
# => true
"12" =~ DateParser.month() |> Regex.compile!()
# => true
"1970" =~ DateParser.year() |> Regex.compile!()
# => true
Implement day_names/0
and month_names/0
to return a string pattern which, when compiled, would match the named day of the week and the named month of the year respectively.
"Tuesday" =~ DateParser.day_names() |> Regex.compile!()
# => true
"June" =~ DateParser.month_names() |> Regex.compile!()
# => true
Implement capture_day/0
, capture_month/0
, capture_year/0
, capture_day_name/0
, capture_month_name/0
to return a string pattern which captures the respective components to the names: "day"
, "month"
, "year"
, "day_name"
, "month_name"
DateParser.capture_month_name()
|> Regex.compile!()
|> Regex.named_captures("December")
# => %{"month_name" => "December"}
Implement capture_numeric_date/0
, capture_month_name_date()
, and capture_day_month_name_date/0
to return a string pattern which captures the components from part 3 using the respective date format:
"01/01/1970"
"January 1, 1970"
"Thursday, January 1, 1970"
DateParser.capture_numeric_date()
|> Regex.compile!()
|> Regex.named_captures("01/01/1970")
# => %{"day" => "01", "month" => "01", "year" => "1970"}
Implement match_numeric_date/0
, match_month_name_date/0
, and match_day_month_name_date/0
to return a compiled regular expression that only matches the date, and which can also capture the components.
"Thursday, January 1, 1970 was the Unix epoch." =~ DateParser.match_day_month_name_date()
# => false
"Thursday, January 1, 1970" =~ DateParser.match_day_month_name_date()
# => true
DateParser.match_day_month_name_date()
|> Regex.named_captures("Thursday, January 1, 1970")
# => %{
# "day" => "1",
# "day_name" => "Thursday",
# "month_name" => "January",
# "year" => "1970"
# }
Sign up to Exercism to learn and master Elixir with 57 concepts, 160 exercises, and real human mentoring, all for free.