Package regexp offers support for regular expressions in Go.
The syntax of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages.
Both the search patterns and the input texts are interpreted as UTF-8.
When using backticks (`) to make strings, backslashes (\
) don't have any special meaning and don't mark the beginning of special characters like tabs \t
or newlines \n
:
"\t\n" // regular string literal with 2 characters: a tab and a newline
`\t\n`// raw string literal with 4 characters: two backslashes, a 't', and an 'n'
Because of this, using backticks is desirable to make regular expressions, because it means we don't need to escape backslashes:
"\\" // string with a single backslash
`\\` // string with 2 backslashes
RegExp
typeTo use a regular expression, we first must compile the string pattern.
Compilation here means taking the string pattern of the regular expression and converting it into an internal representation that is easier to work with.
We only need to compile each pattern once, after that we can use the compiled version of the regular expression many times.
The type regexp.Regexp
represents a compiled regular expressions.
We can compile a string pattern into a regexp.Regexp
using the function regexp.Compile
.
This function returns nil
and an error if compilation failed:
re, err := regexp.Compile(`(a|b)+`)
fmt.Println(re, err) // => (a|b)+ <nil>
re, err = regexp.Compile(`a|b)+`)
fmt.Println(re, err) // => <nil> error parsing regexp: unexpected ): `a|b)+`
Function MustCompile
is a convenient alternative to Compile
:
re = regexp.MustCompile(`[a-z]+\d*`)
Using this function, there is no need to handle an error.
MustCompile
should only be used when we know for sure the pattern does compile, as otherwise the program will panic.
There are 16 methods of Regexp
that match a regular expression and identify the matched text.
Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
All
is present, the routine matches successive non-overlapping matches of the entire expressions.String
is present, the argument is a string; otherwise it is a slice of bytes; return values are adjusted as appropriate.Submatch
is present, the return value is a slice identifying the successive submatches of the expression.Index
is present, matches and submatches are identified by byte index pairs within the input string.There are also methods for:
All-in-all, the regexp
package defines more than 40 functions and methods.
We will demonstrate the use of a few methods below.
Please see the API documentation for details of these and other functions.
MatchString
ExamplesMethod MatchString
reports whether a string contains any match of a regular expression.
re = regexp.MustCompile(`[a-z]+\d*`)
b = re.MatchString("[a12]") // => true
b = re.MatchString("12abc34(ef)") // => true
b = re.MatchString(" abc!") // => true
b = re.MatchString("123 456") // => false
FindString
ExamplesMethod FindString
returns a string holding the text of the leftmost match of the regular expression.
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.FindString("[a12]") // => "a12"
s = re.FindString("12abc34(ef)") // => "abc34"
s = re.FindString(" abc!") // => "abc"
s = re.FindString("123 456") // => ""
FindStringSubmatch
ExamplesMethod FindStringSubmatch
returns a slice of strings holding the text of the leftmost match of the regular expression and the matches, if any, of its subexpressions.
This can be used to identify the strings matching capturing groups.
A return value of nil
indicates no match.
re = regexp.MustCompile(`[a-z]+(\d*)`)
sl = re.FindStringSubmatch("[a12]") // => []string{"a12","12"}
sl = re.FindStringSubmatch("12abc34(ef)") // => []string{"abc34","34"}
sl = re.FindStringSubmatch(" abc!") // => []string{"abc",""}
sl = re.FindStringSubmatch("123 456") // => <nil>
ReplaceAllString
ExamplesMethod re.ReplaceAllString(src,repl)
returns a copy of src
, replacing matches of the regular expression re
with the replacement string repl
.
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.ReplaceAllString("[a12]", "X") // => "[X]"
s = re.ReplaceAllString("12abc34(ef)", "X") // => "12X(X)"
s = re.ReplaceAllString(" abc!", "X") // => " X!"
s = re.ReplaceAllString("123 456", "X") // => "123 456"
Split
ExamplesMethod re.Split(s,n)
slices a text s
into substrings separated by the expression and returns a slice of the substrings between those expression matches.
The count n
determines the maximal number of substrings to return.
If n<0
, the method returns all substrings.
re = regexp.MustCompile(`[a-z]+\d*`)
sl = re.Split("[a12]", -1) // => []string{"[","]"}
sl = re.Split("12abc34(ef)", 2) // => []string{"12","(ef)"}
sl = re.Split(" abc!", -1) // => []string{" ","!"}
sl = re.Split("123 456", -1) // => []string{"123 456"}
The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input.
Package regexp
implements RE2 regular expressions (except for \C
).
The syntax is largely compatible with PCRE ("Perl Compatible Regular Expression"), but there are some differences.
Please see the "Caveat section" in this article for details.