A str
in Python is an immutable sequence of Unicode code points.
These may include letters, diacritical marks, positioning characters, numbers, currency symbols, emoji, punctuation, space and line break characters, and more.
For a deep dive on what information a string encodes (or, "how does a computer know how to translate zeroes and ones into letters?"), this blog post is enduringly helpful.
The Python docs also provide a very detailed unicode HOWTO that discusses Pythons support for the Unicode specification in the str
, bytes
and re
modules, considerations for locales, and some common issues with encoding and translation.
Strings implement all common sequence operations and can be iterated through using for item in <str>
or for index, item in enumerate(<str>)
syntax.
Individual code points (strings of length 1) can be referenced by 0-based index
number from the left, or -1-based index
number from the right.
Strings can be concatenated with +
, or via <str>.join(<iterable>)
, split via <str>.split(<separator>)
, and offer multiple formatting and assembly options.
A str
literal can be declared via single '
or double "
quotes. The escape \
character is available as needed.
>>> single_quoted = 'These allow "double quoting" without "escape" characters.'
>>> double_quoted = "These allow embedded 'single quoting', so you don't have to use an 'escape' character".
Multi-line strings are declared with '''
or """
.
>>> triple_quoted = '''Three single quotes or "double quotes" in a row allow for multi-line string literals.
Line break characters, tabs and other whitespace is fully supported. Remember - The escape "\" character is also available if needed (as can be seen below).
You\'ll most often encounter multi-line strings as "doc strings" or "doc tests" written just below the first line of a function or class definition.
They\'re often used with auto documentation β tools.
'''
The str(<object>)
constructor can be used to create/coerce strings from other objects:
>>> my_number = 42
>>> str(my_number)
...
"42"
While the str(<object>)
constructor can be used to coerce/convert strings, it will not iterate or unpack an object.
This is different from the behavior of constructors for other data types such as list()
, set()
, dict()
, or tuple()
, and can have surprising results.
>>> numbers = [1,3,5,7]
>>> str(numbers)
...
'[1,3,5,7]'
Code points within a str
can be referenced by 0-based index
number from the left:
creative = 'μ°½μμ μΈ'
>>> creative[0]
'μ°½'
>>> creative[2]
'μ '
>>> creative[3]
'μΈ'
Indexing also works from the right, starting with a -1-based index
:
creative = 'μ°½μμ μΈ'
>>> creative[-4]
'μ°½'
>>> creative[-2]
'μ '
>>> creative[-1]
'μΈ'
There is no separate βcharacterβ or "rune" type in Python, so indexing a string produces a new str
of length 1:
>>> website = "exercism"
>>> type(website[0])
<class 'str'>
>>> len(website[0])
1
>>> website[0] == website[0:1] == 'e'
True
Substrings can be selected via slice notation, using <str>[<start>:stop:<step>]
to produce a new string.
Results exclude the stop
index.
If no start
is given, the starting index will be 0.
If no stop
is given, the stop
index will be the end of the string.
moon_and_stars = 'πππππβ'
>>> moon_and_stars[1:4]
'πππ'
>>> moon_and_stars[:3]
'πππ'
>>> moon_and_stars[3:]
'ππβ'
>>> moon_and_stars[:-1]
'πππππ'
>>> moon_and_stars[:-3]
'πππ'
Strings can also be broken into smaller strings via <str>.split(<separator>)
, which will return a list
of substrings.
Using <str>.split()
without any arguments will split the string on whitespace.
>>> cat_ipsum = "Destroy house in 5 seconds command the hooman."
>>> cat_ipsum.split()
...
['Destroy', 'house', 'in', '5', 'seconds', 'command', 'the', 'hooman.']
>>> cat_words = "feline, four-footed, ferocious, furry"
>>> cat_words.split(',')
...
['feline', ' four-footed', ' ferocious', ' furry']
>>> colors = """red,
orange,
green,
purple,
yellow"""
>>> colors.split(',\n')
['red', 'orange', 'green', 'purple', 'yellow']
Strings can be concatenated using the +
operator.
This method should be used sparingly, as it is not very performant or easily maintained.
language = "Ukrainian"
number = "nine"
word = "Π΄Π΅Π²'ΡΡΡ"
sentence = word + " " + "means" + " " + number + " in " + language + "."
>>> print(sentence)
...
"Π΄Π΅Π²'ΡΡΡ means nine in Ukrainian."
If a list
, tuple
, set
or other collection of individual strings needs to be combined into a single str
, <str>.join(<iterable>)
, is a better option:
# str.join() makes a new string from the iterables elements.
>>> chickens = ["hen", "egg", "rooster"]
>>> ' '.join(chickens)
'hen egg rooster'
# Any string can be used as the joining element.
>>> ' :: '.join(chickens)
'hen :: egg :: rooster'
>>> ' πΏ '.join(chickens)
'hen πΏ egg πΏ rooster'
Strings support all common sequence operations.
Individual code points can be iterated through in a loop via for item in <str>
.
Indexes with items can be iterated through in a loop via for index, item in enumerate(<str>)
.
>>> exercise = 'αα±α·αα»ααΊα·'
# Note that there are more code points than perceived glyphs or characters
>>> for code_point in exercise:
... print(code_point)
...
α
α±
α·
α
α»
α
αΊ
α·
# Using enumerate will give both the value and index position of each element.
>>> for index, code_point in enumerate(exercise):
... print(index, ": ", code_point)
...
0 : α
1 : α±
2 : α·
3 : α
4 : α»
5 : α
6 : αΊ
7 : α·
Python provides a rich set of string methods that can assist with searching, cleaning, splitting, transforming, translating, and many other operations. A selection of these methods are covered in another exercise.
Python also provides a rich set of tools for formatting and templating strings, as well as more sophisticated text processing through the re (regular expressions), difflib (sequence comparison), and textwrap modules. For a great introduction to string formatting in Python, see this post at Real Python. For an introduction to string methods, see Strings and Character Data in Python at the same site.
In addition to str
(a text sequence), Python has corresponding binary sequence types summarized under binary data services -- bytes
(a binary sequence), bytearray
, and memoryview
for the efficient storage and handling of binary data.
Additionally, Streams allow sending and receiving binary data over a network connection without using callbacks.