Explore the 'Regex join' approach for Acronym in Python on Exercism

import re

###re.findall###

def abbreviate(to_abbreviate):
    #Capitalize the input before cleaning.
    removed = re.findall(r"[a-zA-Z']+", to_abbreviate.upper())
    
    return ''.join(word[0] for word in removed)

#OR#

def abbreviate(to_abbreviate):
    #Capitalize the result after joining.
    return ''.join(word[0] for word in
                   re.findall(r"[a-zA-Z']+", to_abbreviate)).upper()
                   
###re.finditer###

def abbreviate(to_abbreviate):
    #Capitalize the input before cleaning.
    removed = re.finditer(r"[a-zA-Z']+", to_abbreviate.upper())

    #word.group(0)[0] (first letter of Matched word) can also be written as
    #word[0][0], with the first bracketed number referring to Match group 0.
    return ''.join(word.group(0)[0] for word in removed)

#OR#

def abbreviate(to_abbreviate):
    #Capitalize the output after joining.
    #Use bracket notation for Match group.
    return ''.join(word[0][0] for word in
                   re.finditer(r"[a-zA-Z']+", to_abbreviate)).upper()

This approach begins by using re.findall() method from the re module to "scrub" (remove) non-letter characters such as ',-,_, and white space from to_abbreviate. Python's re module provides support for regular expressions within the language, and has many useful methods for searching, parsing, and modifying text. Regular expression matching starts at the left-hand side of the input and travels toward the right.

re.findall() searches text for all matching patterns, returning results (including 'empty' matches) in a list of strings.

The re.finditer() method works in the same fashion as re.findall(), but returns results as a lazy iterator over Match objects. This means that re.finditer() produces matches on demand instead of saving them to memory, but needs to have both the iterator and the Match objects unpacked.

The regular expression r[a-zA-Z']+ in the code example looks for any single character in the range a-z lowercase and A-Z uppercase, plus the ' (apostrophe) character. The + operator is a 'greedy' modifier that matches the previous range one to unlimited times. This means that the expression will match any collection or repeat of letters (word), but will omit matching on any sort of space or 'non-letter' character, such as \t, \n, , _, or -.

For example, in Complementary metal-oxide semiconductor, the regex will match Complementary, metal, oxide, and semiconductor. The regex will not match on or -. The result returned by findall() will then be ['Complementary', 'metal', 'oxide', 'semiconductor'].

Note

to_abbreviate.replace("_", " ").replace("-", " ").upper().split() can also be used to 'scrub' to_abbreviate and turn the results into a list. The .replace() approach benchmarked faster than using re.findall()/re.finditer() to 'scrub', most likely due to overhead in importing the re module and in the backtracking behavior of regex searching and matching.

Once findall() or finditer() completes, a generator-expression is used to iterate through the results and select the first letters of each word via bracket notation. Note that when using finditer(), the Match object has to be unpacked via match.group(0)/match[0] before the first letter can be selected.

Generator expressions are short-form generators - lazy iterators that produce their values on demand, instead of saving them to memory. This generator expression is consumed by str.join(), which joins the generated letters together using an empty string. Other "separator" strings can be used with str.join() - see string-methods for some additional examples.

Finally, the result of .join() is capitalized using the chained .upper(). Alternatively, .upper() can be used on to_abbreviate within findall()/finditer(), to uppercase the input before cleaning. Since the generator expression + join + upper is fairly succinct, they can be placed directly on the return line rather than assigning and returning an intermediate variable for the acronym.

This approach was less performant in benchmarks than those using loop, map, list-comprehension, and reduce.

25th Jun 2025 · Found it useful?

from functools import reduce def abbreviate(to_abbreviate): phrase = to_abbreviate.replace("_", " ").replace("-", " ").upper().split() return reduce(lambda start, word: start + word[0], phrase, "")

def abbreviate(to_abbreviate): phrase = to_abbreviate.replace('-', ' ').replace('_', ' ').upper().split() # note the lack of square brackets around the comprehension. return ''.join(word[0] for word in phrase)

Regex join

Other Approaches to Acronym in Python

Functools Reduce

Generator Expression

List Comprehension

Loop

Map Built-in

Regex Sub

Language Tracks

Coding Fundamentals

Front-end Fundamentals

Your Journey

Exercism Perks

Community Videos

Brief Introduction Series

Interviews & Stories

Discord

Forum

Getting started

Mentoring

Docs

Contributors

Donate

About Exercism

Our Impact

Insiders

Find this approach useful?