use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = HashSet::new();
candidate
.bytes()
.filter(|c| c.is_ascii_alphabetic())
.map(|c| c.to_ascii_lowercase())
.all(|c| hs.insert(c))
}
With this approach you will instantiate and update a HashSet to keep track of the used letters.
A use declaration allows directly calling HashSet instead of calling it with its entire namespace.
Without the use declaration, the HashSet would be instantiated like so
let mut hs = std::collections::HashSet::new();
After the HashSet is instantiated, a series of functions are chained from the candidate &str.
- Since all of the characters are ASCII, they can be iterated with the
bytesmethod. Each byte is iterated as au8, which is an unsigned 8-bit integer. - The
filtermethod borrows each byte as a reference to au8(&u8). Inside of its closure it tests each byte to see if itis_ascii_alphabetic. Only bytes which are ASCII letters will survive thefilterto be passed on to themapmethod. - The
mapmethod callsto_ascii_lowercaseon each byte. - Each lowercased byte is then tested by the
allmethod by using theinsertmethod ofHashSet.allwill returntrueif every call toinsertreturns true. If a call toinsertreturnsfalsethenallwill "short-circuit" and immediately returnfalse. Theinsertmethod returns whether the value is newly inserted. So, for the word"alpha",insertwill returntruewhen the firstais inserted, but will returnfalsewhen the secondais inserted.
Refactoring
using the str method to_ascii_lowercase and no map
You might want to to call the str method to_ascii_lowercase and save calling map,
like so
candidate
.to_ascii_lowercase()
.bytes()
.filter(|c| c.is_ascii_alphabetic())
.all(|c| hs.insert(c))
However, changing the case of all characters in a str raised the average benchmark a few nanoseconds.
It is a bit faster to filter out non-ASCII letters and to change the case of each surviving byte.
Since the performance is fairly close, either may be prefered.
using filter_map
Since filter and map are used, this approach could be refactored using the filter_map method.
use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = HashSet::new();
candidate
.bytes()
.filter_map(|c| c.is_ascii_alphabetic().then(|| c.to_ascii_lowercase()))
.all(|c| hs.insert(c))
}
By chaining the then method to the result of is_ascii_alphabetic,
and calling to_ascii_lowercase in the closure for then,
the filter map passes only lowercased ASCII letter bytes to the all method.
In benchmarking, this approach was slightly slower, but its style may be prefered.
supporting Unicode
By substituting the chars method for the bytes method,
and by using the char methods is_alphabetic and to_lowercase,
this approach can support Unicode characters.
use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = std::collections::HashSet::new();
candidate
.chars()
.filter(|c| c.is_alphabetic())
.map(|c| c.to_lowercase().to_string())
.all(|c| hs.insert(c))
}
Usually an approach that supports Unicode will be slower than one that supports only bytes.
However the benchmark for this approach was significantly slower, taking more than twice as long as the bytes approach.
It can be further refactored to use the str to_lowercase method and remove the map method
to cut the benchmark down closer to the byte approach.
use std::collections::HashSet;
pub fn check(candidate: &str) -> bool {
let mut hs = std::collections::HashSet::new();
candidate
.to_lowercase()
.chars()
.filter(|c| c.is_alphabetic())
.all(|c| hs.insert(c))
}
To more completely support Unicode, an external crate, such as unicode-segmentation, could be used. This is becasue the std::char can not fully handle things such as grapheme clusters.