Package regexp offers support for regular expressions in Go.
The syntax of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages.
Both the search patterns and the input texts are interpreted as UTF-8.
When using backticks (`) to make strings, backslashes (\
) don't have any special meaning and don't mark the beginning of special characters like tabs \t
or newlines \n
:
"\t\n" // regular string literal with 2 characters: a tab and a newline
`\t\n`// raw string literal with 4 characters: two backslashes, a 't', and an 'n'
Because of this, using backticks is desirable to make regular expressions, because it means we don't need to escape backslashes:
"\\" // string with a single backslash
`\\` // string with 2 backslashes
RegExp
typeTo use a regular expression, we first must compile the string pattern.
Compilation here means taking the string pattern of the regular expression and converting it into an internal representation that is easier to work with.
We only need to compile each pattern once, after that we can use the compiled version of the regular expression many times.
The type regexp.Regexp
represents a compiled regular expression.
We can compile a string pattern into a regexp.Regexp
using the function regexp.Compile
.
This function returns nil
and an error if compilation failed:
re, err := regexp.Compile(`(a|b)+`)
fmt.Println(re, err) // => (a|b)+ <nil>
re, err = regexp.Compile(`a|b)+`)
fmt.Println(re, err) // => <nil> error parsing regexp: unexpected ): `a|b)+`
Function MustCompile
is a convenient alternative to Compile
:
re = regexp.MustCompile(`[a-z]+\d*`)
Using this function, there is no need to handle an error.
MustCompile
should only be used when we know for sure the pattern does compile, as otherwise the program will panic.
There are 16 methods of Regexp
that match a regular expression and identify the matched text.
Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
All
is present, the routine matches successive non-overlapping matches of the entire expression.String
is present, the argument is a string; otherwise it is a slice of bytes; return values are adjusted as appropriate.Submatch
is present, the return value is a slice identifying the successive submatches of the expression.Index
is present, matches and submatches are identified by byte index pairs within the input string.There are also methods for:
All-in-all, the regexp
package defines more than 40 functions and methods.
We will demonstrate the use of a few methods below.
Please see the API documentation for details of these and other functions.
MatchString
ExamplesMethod MatchString
reports whether a string contains any match of a regular expression.
re = regexp.MustCompile(`[a-z]+\d*`)
b = re.MatchString("[a12]") // => true
b = re.MatchString("12abc34(ef)") // => true
b = re.MatchString(" abc!") // => true
b = re.MatchString("123 456") // => false
FindString
ExamplesMethod FindString
returns a string holding the text of the leftmost match of the regular expression.
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.FindString("[a12]") // => "a12"
s = re.FindString("12abc34(ef)") // => "abc34"
s = re.FindString(" abc!") // => "abc"
s = re.FindString("123 456") // => ""
FindStringSubmatch
ExamplesMethod FindStringSubmatch
returns a slice of strings holding the text of the leftmost match of the regular expression and the matches, if any, of its subexpressions.
This can be used to identify the strings matching capturing groups.
A return value of nil
indicates no match.
re = regexp.MustCompile(`[a-z]+(\d*)`)
sl = re.FindStringSubmatch("[a12]") // => []string{"a12","12"}
sl = re.FindStringSubmatch("12abc34(ef)") // => []string{"abc34","34"}
sl = re.FindStringSubmatch(" abc!") // => []string{"abc",""}
sl = re.FindStringSubmatch("123 456") // => <nil>
ReplaceAllString
ExamplesMethod re.ReplaceAllString(src,repl)
returns a copy of src
, replacing matches of the regular expression re
with the replacement string repl
.
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.ReplaceAllString("[a12]", "X") // => "[X]"
s = re.ReplaceAllString("12abc34(ef)", "X") // => "12X(X)"
s = re.ReplaceAllString(" abc!", "X") // => " X!"
s = re.ReplaceAllString("123 456", "X") // => "123 456"
Split
ExamplesMethod re.Split(s,n)
slices a text s
into substrings separated by the expression and returns a slice of the substrings between those expression matches.
The count n
determines the maximal number of substrings to return.
If n<0
, the method returns all substrings.
re = regexp.MustCompile(`[a-z]+\d*`)
sl = re.Split("[a12]", -1) // => []string{"[","]"}
sl = re.Split("12abc34(ef)", 2) // => []string{"12","(ef)"}
sl = re.Split(" abc!", -1) // => []string{" ","!"}
sl = re.Split("123 456", -1) // => []string{"123 456"}
This exercise addresses the parsing of log files.
After a recent security review you have been asked to clean up the organization's archived log files.
All strings passed to the functions are guaranteed to be non-null and without leading and trailing spaces.
You need some idea of how many log lines in your archive do not comply with current standards. You believe that a simple test reveals whether a log line is valid. To be considered valid a line should begin with one of the following strings:
Implement the IsValidLine
function to return false
if a string is not valid otherwise true
.
IsValidLine("[ERR] A good error here")
// => true
IsValidLine("Any old [ERR] text")
// => false
IsValidLine("[BOB] Any old text")
// => false
A new team has joined the organization, and you find their log files are using a strange separator for "fields". Instead of something sensible like a colon ":" they use a string such as "<--->" or "<=>" (because it's prettier) in fact any string that has a first character of "<" and a last character of ">" and any combination of the following characters "~", "*", "=" and "-" in between.
Implement the SplitLogLine
function that takes a line and returns an array of strings each of which contains a field.
SplitLogLine("section 1<*>section 2<~~~>section 3")
// => []string{"section 1", "section 2", "section 3"},
password
in quoted textThe team needs to know about references to passwords in quoted text so that they can be examined manually.
Implement the CountQuotedPasswords
function to provide an indication of the likely scale of the manual exercise.
Identify log lines where the string "password", which may be in any combination of upper or lower case, is surrounded by quotation marks. You should account for the possibility of additional content between the quotation marks before and after "password". Each line will contain at most two quotation marks.
Lines passed to the routine may or may not be valid as defined in task 1. We process them in the same way, whether or not they are valid.
lines := []string{
`[INF] passWord`, // contains 'password' but not surrounded by quotation marks
`"passWord"`, // count this one
`[INF] User saw error message "Unexpected Error" on page load.`, // does not contain 'password'
`[INF] The message "Please reset your password" was ignored by the user`, // count this one
}
// => 2
You have found that some upstream processing of the logs has been scattering the text "end-of-line" followed by a line number (without an intervening space) throughout the logs.
Implement the RemoveEndOfLineText
function to take a string and remove the end-of-line text and return a "clean" string.
Lines not containing end-of-line text should be returned unmodified.
Just remove the end of line string. Do not attempt to adjust the whitespaces.
RemoveEndOfLineText("[INF] end-of-line23033 Network Failure end-of-line27")
// => "[INF] Network Failure "
You have noticed that some of the log lines include sentences that refer to users.
These sentences always contain the string "User"
, followed by one or more space characters, and then a user name.
You decide to tag such lines.
Implement a function TagWithUserName
that processes log lines:
"User "
remain unchanged."User "
, prefix the line with [USR]
followed by the user name.For example:
result := TagWithUserName([]string{
"[WRN] User James123 has exceeded storage space.",
"[WRN] Host down. User Michelle4 lost connection.",
"[INF] Users can login again after 23:00.",
"[DBG] We need to check that user names are at least 6 chars long.",
})
// => []string {
// "[USR] James123 [WRN] User James123 has exceeded storage space.",
// "[USR] Michelle4 [WRN] Host down. User Michelle4 lost connection.",
// "[INF] Users can login again after 23:00.",
// "[DBG] We need to check that user names are at least 6 chars long."
// }
You can assume that:
"User "
in each line.Sign up to Exercism to learn and master Go with 34 concepts, 141 exercises, and real human mentoring, all for free.