use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = HashSet::new();
candidate
.bytes()
.filter(|c| c.is_ascii_alphabetic())
.map(|c| c.to_ascii_lowercase())
.all(|c| hs.insert(c))
}
With this approach you will instantiate and update a HashSet
to keep track of the used letters.
A use
declaration allows directly calling HashSet
instead of calling it with its entire namespace.
Without the use
declaration, the HashSet
would be instantiated like so
let mut hs = std::collections::HashSet::new();
After the HashSet
is instantiated, a series of functions are chained from the candidate
&str
.
- Since all of the characters are ASCII, they can be iterated with the
bytes
method. Each byte is iterated as au8
, which is an unsigned 8-bit integer. - The
filter
method borrows each byte as a reference to au8
(&u8
). Inside of its closure it tests each byte to see if itis_ascii_alphabetic
. Only bytes which are ASCII letters will survive thefilter
to be passed on to themap
method. - The
map
method callsto_ascii_lowercase
on each byte. - Each lowercased byte is then tested by the
all
method by using theinsert
method ofHashSet
.all
will returntrue
if every call toinsert
returns true. If a call toinsert
returnsfalse
thenall
will "short-circuit" and immediately returnfalse
. Theinsert
method returns whether the value is newly inserted. So, for the word"alpha"
,insert
will returntrue
when the firsta
is inserted, but will returnfalse
when the seconda
is inserted.
Refactoring
using the str
method to_ascii_lowercase and no map
You might want to to call the str
method to_ascii_lowercase and save calling map
,
like so
candidate
.to_ascii_lowercase()
.bytes()
.filter(|c| c.is_ascii_alphabetic())
.all(|c| hs.insert(c))
However, changing the case of all characters in a str
raised the average benchmark a few nanoseconds.
It is a bit faster to filter
out non-ASCII letters and to change the case of each surviving byte.
Since the performance is fairly close, either may be prefered.
using filter_map
Since filter
and map
are used, this approach could be refactored using the filter_map
method.
use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = HashSet::new();
candidate
.bytes()
.filter_map(|c| c.is_ascii_alphabetic().then(|| c.to_ascii_lowercase()))
.all(|c| hs.insert(c))
}
By chaining the then
method to the result of is_ascii_alphabetic
,
and calling to_ascii_lowercase
in the closure for then
,
the filter map
passes only lowercased ASCII letter bytes to the all
method.
In benchmarking, this approach was slightly slower, but its style may be prefered.
supporting Unicode
By substituting the chars
method for the bytes
method,
and by using the char
methods is_alphabetic
and to_lowercase
,
this approach can support Unicode characters.
use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = std::collections::HashSet::new();
candidate
.chars()
.filter(|c| c.is_alphabetic())
.map(|c| c.to_lowercase().to_string())
.all(|c| hs.insert(c))
}
Usually an approach that supports Unicode will be slower than one that supports only bytes.
However the benchmark for this approach was significantly slower, taking more than twice as long as the bytes approach.
It can be further refactored to use the str
to_lowercase method and remove the map
method
to cut the benchmark down closer to the byte approach.
use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = std::collections::HashSet::new();
candidate
.to_lowercase()
.chars()
.filter(|c| c.is_alphabetic())
.all(|c| hs.insert(c))
}
To more completely support Unicode, an external crate, such as unicode-segmentation, could be used. This is becasue the std::char can not fully handle things such as grapheme clusters.