St

Strings in Python

79 exercises

About Strings

A str in Python is an immutable sequence of Unicode code points. These may include letters, diacritical marks, positioning characters, numbers, currency symbols, emoji, punctuation, space and line break characters, and more.

For a deep dive on what information a string encodes (or, "how does a computer know how to translate zeroes and ones into letters?"), this blog post is enduringly helpful. The Python docs also provide a very detailed unicode HOWTO that discusses Pythons support for the Unicode specification in the str, bytes and re modules, considerations for locales, and some common issues with encoding and translation.

Strings implement all common sequence operations and can be iterated through using for item in <str> or for index, item in enumerate(<str>) syntax. Individual code points (strings of length 1) can be referenced by 0-based index number from the left, or -1-based index number from the right.

Strings can be concatenated with +, or via <str>.join(<iterable>), split via <str>.split(<separator>), and offer multiple formatting and assembly options.

A str literal can be declared via single ' or double " quotes. The escape \ character is available as needed.


>>> single_quoted = 'These allow "double quoting" without "escape" characters.'

>>> double_quoted = "These allow embedded 'single quoting', so you don't have to use an 'escape' character".

Multi-line strings are declared with ''' or """.

>>> triple_quoted = '''Three single quotes or "double quotes" in a row allow for multi-line string literals.
  Line break characters, tabs and other whitespace is fully supported. Remember - The escape "\" character is also available if needed (as can be seen below). 
  
  You\'ll most often encounter multi-line strings as "doc strings" or "doc tests" written just below the first line of a function or class definition.
    They\'re often used with auto documentation ✍ tools.
    '''

The str(<object>) constructor can be used to create/coerce strings from other objects:

>>> my_number = 42
>>> str(my_number)
...
"42"

While the str(<object>) constructor can be used to coerce/convert strings, it will not iterate or unpack an object. This is different from the behavior of constructors for other data types such as list(), set(), dict(), or tuple(), and can have surprising results.

>>> numbers = [1,3,5,7]
>>> str(numbers)
...
'[1,3,5,7]'

Code points within a str can be referenced by 0-based index number from the left:

creative = '창의적인'

>>> creative[0]
'μ°½'

>>> creative[2]
'적'

>>> creative[3]
'인'

Indexing also works from the right, starting with a -1-based index:

creative = '창의적인'

>>> creative[-4]
'μ°½'

>>> creative[-2]
'적'

>>> creative[-1]
'인'

There is no separate β€œcharacter” or "rune" type in Python, so indexing a string produces a new str of length 1:


>>> website = "exercism"
>>> type(website[0])
<class 'str'>

>>> len(website[0])
1

>>> website[0] == website[0:1] == 'e'
True

Substrings can be selected via slice notation, using <str>[<start>:stop:<step>] to produce a new string. Results exclude the stop index. If no start is given, the starting index will be 0. If no stop is given, the stop index will be the end of the string.

moon_and_stars = 'πŸŒŸπŸŒŸπŸŒ™πŸŒŸπŸŒŸβ­'

>>> moon_and_stars[1:4]
'πŸŒŸπŸŒ™πŸŒŸ'

>>> moon_and_stars[:3]
'πŸŒŸπŸŒŸπŸŒ™'

>>> moon_and_stars[3:]
'🌟🌟⭐'

>>> moon_and_stars[:-1]
'πŸŒŸπŸŒŸπŸŒ™πŸŒŸπŸŒŸ'

>>> moon_and_stars[:-3]
'πŸŒŸπŸŒŸπŸŒ™'

Strings can also be broken into smaller strings via <str>.split(<separator>), which will return a list of substrings. Using <str>.split() without any arguments will split the string on whitespace.

>>> cat_ipsum = "Destroy house in 5 seconds command the hooman."
>>> cat_ipsum.split()
...
['Destroy', 'house', 'in', '5', 'seconds', 'command', 'the', 'hooman.']


>>> cat_words = "feline, four-footed, ferocious, furry"
>>> cat_words.split(',')
...
['feline', ' four-footed', ' ferocious', ' furry']


>>> colors = """red,
orange,
green,
purple,
yellow"""

>>> colors.split(',\n')
['red', 'orange', 'green', 'purple', 'yellow']

Strings can be concatenated using the + operator. This method should be used sparingly, as it is not very performant or easily maintained.

language = "Ukrainian"
number = "nine"
word = "Π΄Π΅Π²'ΡΡ‚ΡŒ"

sentence = word + " " + "means" + " " + number + " in " + language + "."

>>> print(sentence)
...
"Π΄Π΅Π²'ΡΡ‚ΡŒ means nine in Ukrainian."

If a list, tuple, set or other collection of individual strings needs to be combined into a single str, <str>.join(<iterable>), is a better option:

# str.join() makes a new string from the iterables elements.
>>> chickens = ["hen", "egg", "rooster"]
>>> ' '.join(chickens)
'hen egg rooster'

# Any string can be used as the joining element.
>>> ' :: '.join(chickens)
'hen :: egg :: rooster'

>>> ' 🌿 '.join(chickens)
'hen 🌿 egg 🌿 rooster'

Strings support all common sequence operations. Individual code points can be iterated through in a loop via for item in <str>. Indexes with items can be iterated through in a loop via for index, item in enumerate(<str>).


>>> exercise = 'α€œα€±α€·α€€α€»α€„α€Ία€·'

# Note that there are more code points than perceived glyphs or characters
>>> for code_point in exercise:
...    print(code_point)
...
α€œ
α€±
α€·
α€€
α€»
င
α€Ί
α€·

# Using enumerate will give both the value and index position of each element.
>>> for index, code_point in enumerate(exercise):
...    print(index, ": ", code_point)
...
0 :  α€œ
1 :  α€±
2 :  α€·
3 :  α€€
4 :  α€»
5 :  င
6 :  α€Ί
7 :  α€·

String Methods

Python provides a rich set of string methods that can assist with searching, cleaning, splitting, transforming, translating, and many other operations. A selection of these methods are covered in another exercise.

Formatting

Python also provides a rich set of tools for formatting and templating strings, as well as more sophisticated text processing through the re (regular expressions), difflib (sequence comparison), and textwrap modules. For a great introduction to string formatting in Python, see this post at Real Python. For an introduction to string methods, see Strings and Character Data in Python at the same site.

In addition to str (a text sequence), Python has corresponding binary sequence types summarized under binary data services -- bytes (a binary sequence), bytearray, and memoryview for the efficient storage and handling of binary data. Additionally, Streams allow sending and receiving binary data over a network connection without using callbacks.

Edit via GitHub The link opens in a new window or tab

Learn Strings

Practicing is locked

Unlock 4 more exercises to practice Strings