Regular Expressions

Regular Expressions in

24 exercises

About Regular Expressions

Regular expressions (regex) are a powerful tool for working with strings in Elixir. Regular expressions in Elixir follow the PCRE specification (Perl Compatible Regular Expressions). String patterns representing the regular expression's meaning are first compiled then used for matching all or part of a string.

In Elixir, the most common way to create regular expressions is using the ~r sigil. Sigils provide syntactic sugar shortcuts for common tasks in Elixir. In this case, ~r is a shortcut for using Regex.compile!/2.

Regex.compile!("test") == ~r/test/
# => true

The =~/2 operator is useful to perform a regex match on a string to return a boolean result.

"this is a test" =~ ~r/test/
# => true

Regex syntax review

Some characters in a regular expression pattern have special meaning, to use the character plainly it must be escaped with \, e.g. ~r/\?/.
Character classes (e.g. \d, \w) allow patterns to match a range of characters
Alternations (|) allow patterns to match one pattern or another
Quantifiers ({N, M}, *, ?) allow patterns to match a specified number of repeating patterns
Groups (()) allow parts of patterns to function as a unit

Captures

Regular expressions are also useful for extracting a portion of a string. This is called capturing. To capture a part of a string, create a group (()) for the part that you want to capture and use Regex.run.

Regex.run(~r/Weight: (\d*)g/, "Weight: 150g")
# => ["Weight: 150g", "150"]

Captures are numbered (starting at 1) and can also be used in the result when replacing parts of a string with a regular expression:

Regex.replace(~r/Weight: (\d*)g/, "Weight: 150g", "Gewicht: \\1g")
# => "Gewicht: 150g"

Captures can also be named by appending ?<name> after the opening parenthesis. Use Regex.named_captures/3 to get a map with named captures.

Regex.named_captures(~r/Weight: (?<weight>\d*)g/, "Weight: 150g")
# => %{"weight" => "150"}

Modifiers

The behavior of a regular expression can be modified by appending special flags at the end of the regular expression, e.g. ~r/test/i.

caseless i - case insensitive

"A" =~ ~r/a/
# => false
"A" =~ ~r/a/i
# => true

unicode u - enables Unicode specific patterns like \p and causes character classes like \w etc. to also match on Unicode
```
"ö" =~ ~r/^\w$/
# => false
"ö" =~ ~r/^\w$/u
# => true
```
And more: dotall, multiline, extended, firstline, ungreedy

Dynamically building regular expressions

Because the ~r sigil is a shortcut for "pattern" |> Regex.escape() |> Regex.compile!(), you may also use string interpolation to dynamically build a regular expression pattern:

anchor = "$"
regex = ~r/end of the line#{anchor}/
"end of the line?" =~ regex
# => false
"end of the line" =~ regex
# => true

Regular expressions vs the `String` module

Although regular expressions are powerful, it is not always wise to them:

They must be compiled before use, this takes computation time and memory.
They may be slower than using plain string functions.

As a rule of thumb, it is better to use the functions from the String module whenever possible.

# Don't use regular expressions to check a suffix:
if "YELLING!" =~ ~r/!$/, do: "Whoa, chill out!"

# Use a string function:
if String.ends_with?("YELLING!", "!"), do: "Whoa, chill out!"

Learn More

Edit via GitHub

Language Tracks

Coding Fundamentals

Front-end Fundamentals

Your Journey

Exercism Perks

Community Videos

Brief Introduction Series

Interviews & Stories

Discord

Forum

Getting started

Mentoring

Docs

Contributors

Donate

About Exercism

Our Impact

Insiders