Explore the 'Regex Sub' approach for Acronym in Python on Exercism

Approach: use `re.sub`

import re


def abbreviate_regex_sub(to_abbreviate):
    pattern = re.compile(r"(?<!_)\B[\w']+|[ ,\-_]")
 
    return  re.sub(pattern, "", to_abbreviate).upper()
    
###OR###

def abbreviate_regex_sub(to_abbreviate):
    return  re.sub(r"(?<!_)\B[\w']+|[ ,\-_]", "", to_abbreviate.upper())

This approach begins by using the re.sub() method from the re module to "scrub" (remove) unwanted characters such as ',-,_, white space, and all but the first letters of each word from to_abbreviate. Python's re module provides support for regular expressions within the language, and has many useful methods for searching, parsing, and modifying text.

sub() searches text for all matching patterns, substituting a replacement string (in our case, an empty string). Regular expression matching starts at the left-hand side of the input and travels toward the right.

Caution

While it is a fun experiment to see if the entire problem can be more or less solved with a single regex, the excessive backtracking used in this solution slows down performance considerably. This solution tested the slowest of all solutions during benchmarking, taking 652 steps in the regex engine to find and replace 82 matches.

A more performant method of cleaning would be to use re.findall() or re.finditer() to scrub the phrase of unwanted characters, and then process the results with a list-comprehension or loop to extract the first letters of words. to_abbreviate.replace("_", " ").replace("-", " ").upper().split() can also be used, and is even more performant here for cleaning test inputs.

However, if nothing but a regular expression will do, the third-party regex module provides more tools for lookarounds, recursion, partial matches, and nested sets. Experimenting with that third-party library on your local environment (the exercism Python track does not support third-party libraries) could aid in optimizing this complicated regular expression and help with extracting first letters to form acronyms.

The regular expression (?<!_)\B[\w']+|[ ,\-_] in the code example above has two alternatives for matching. For convenience and reuse, the regex is compiled using re.compile(). Alternatives are seperated with the pipe (|) symbol:

(?<!_) is a negative lookbehind, which ensures that _ followed by letter characters (see the pattern explanation below) is not matched (for example, _none is not matched, but _ with a preceding space is matched).
\B[\w']+, which starts searching at a non-word boundary, looks for any character in the group abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_'. The + operator is a 'greedy' modifier that matches a character in the previous group one to unlimited times. This means that this expression will match any collection or repeat of the letters (plus '), but will not match on anything else.
[ ,\-_] matches any of the characters -_, (space, hyphen, underscore, comma) once.

Because these matches are used in the re.sub() method, an empty string is substituted - so the matches are removed from the result.

As an example, for the input phrase The Road _Not_ Taken, the regex will match he, , oad, , -, ot, -, , and aken, replacing each match with ''. The result is the string TRNT.

To ensure that all results are capitalized for any input, the approach then chains .upper() to re.sub() on the return line to produce the final acronym.

To play with this regex and see a more in-depth explanation, you can use it on regex101.

4th Jun 2025 · Found it useful?

from functools import reduce def abbreviate(to_abbreviate): phrase = to_abbreviate.replace("_", " ").replace("-", " ").upper().split() return reduce(lambda start, word: start + word[0], phrase, "")

def abbreviate(to_abbreviate): phrase = to_abbreviate.replace('-', ' ').replace('_', ' ').upper().split() # note the lack of square brackets around the comprehension. return ''.join(word[0] for word in phrase)

Regex Sub

Approach: use `re.sub`

Other Approaches to Acronym in Python

Functools Reduce

Generator Expression

List Comprehension

Loop

Map Built-in

Regex join

Language Tracks

Coding Fundamentals

Front-end Fundamentals

Your Journey

Exercism Perks

Community Videos

Brief Introduction Series

Interviews & Stories

Discord

Forum

Getting started

Mentoring

Docs

Contributors

Donate

About Exercism

Our Impact

Insiders

Find this approach useful?

Find the approach interesting or useful?

Approach: use re.sub

Other Approaches to Acronym in Python

Functools Reduce

Generator Expression

List Comprehension

Loop

Map Built-in

Regex join

Approach: use `re.sub`