def abbreviate(to_abbreviate):
phrase = to_abbreviate.replace('-', ' ').replace('_', ' ').upper().split()
# note the lack of square brackets around the comprehension.
return ''.join(word[0] for word in phrase)
- This approach begins by using
str.replace()
to "scrub" (remove) non-letter characters such as'
,-
,_
, and white space fromto_abbreviate
. - The phrase is then upper-cased by calling
str.upper()
, - Finally, the phrase is turned into a
list
of words by callingstr.split()
.
The three methods above are all chained together, with the output of one method serving as the input to the next method in the "chain".
This works because both replace()
and upper()
return strings, and both upper()
and split()
take strings as arguments.
However, if split()
were called first, replace()
and upper()
would fail, since neither method will take a list
as input.
re.findall()
or re.finditer()
can also be used to "scrub" to_abbreviate
.
These two methods from the re
module will return a list
or a lazy iterator
of results, respectively.
As of this writing, both of these methods benchmark slower than using str.replace()
for scrubbing.
A generator-expression
is then used to iterate through the phrase and select the first letters of each word via bracket notation
.
Generator expressions are short-form generators - lazy iterators that produce their values on demand, instead of saving them to memory.
This generator expression is consumed by str.join()
, which joins the generated letters together using an empty string.
Other "separator" strings can be used with str.join()
- see string-methods for some additional examples.
Since the generator expression and join()
are fairly succinct, they are put directly on the return
line rather than assigning and returning an intermediate variable for the acronym.
In benchmarks, this solution was surprisingly slower than the list comprehension
version.
This article from Oscar Alsing briefly explains why.