A str
is an immutable sequence of Unicode code points.
This may include letters, diacritical marks, positioning characters, numbers, currency symbols, emoji, punctuation, space and line breaks, and more.
Strings implement all common sequence operations and can be iterated through using for item in <str>
or for index, item in enumerate(<str>)
syntax.
Individual code points (strings of length 1) can be referenced by 0-based index
number from the left, or -1-based index
number from the right.
Strings can be concatenated with +
, or via <str>.join(<iterable>)
, split via <str>.split(<separator>)
, and offer multiple formatting and assembly options.
To further work with strings, Python provides a rich set of string methods for searching, cleaning, transforming, translating, and many other operations.
Some of the more commonly used str
methods include:
startswith(<substr>)
and endswith(<substr>)
<str>.title()
, <str>.upper()
/<str>.lower()
, and <str>.swapcase()
<str>.strip(<chars>)
, <str>.lstrip(<chars>)
, or <str>.rstrip(<chars>)
<str>.replace(<old>, <new>)
methodBeing immutable, a str
object's value in memory cannot change; methods that appear to modify a string return a new copy or instance of that str
object.
<str>.endswith(<suffix>)
returns True
if the string ends with <suffix>
, False
otherwise.
>>> 'My heart breaks. 💔'.endswith('💔')
True
>>> 'cheerfulness'.endswith('ness')
True
# Punctuation is part of the string, so needs to be included in any endswith match.
>>> 'Do you want to 💃?'.endswith('💃')
False
>> 'The quick brown fox jumped over the lazy dog.'.endswith('dog')
False
<str>.title()
parses a string and capitalizes the first "character" of each "word" found.
In Python, this is very dependent on the language codec used and how the particular language represents words and characters.
There may also be locale rules in place for a language or character set.
>>> man_in_hat_th = 'ผู้ชายใส่หมวก'
>>> man_in_hat_ru = 'мужчина в шляпе'
>>> man_in_hat_ko = '모자를 쓴 남자'
>>> man_in_hat_en = 'the man in the hat.'
>>> man_in_hat_th.title()
'ผู้ชายใส่หมวก'
>>> man_in_hat_ru.title()
'Мужчина В Шляпе'
>>> man_in_hat_ko.title()
'모자를 쓴 남자'
>> man_in_hat_en.title()
'The Man In The Hat.'
<str>.strip(<chars>)
returns a copy of the str
with leading and trailing <chars>
removed.
The code points specified in <chars>
are not a prefix or suffix - all combinations of the code points will be removed starting from both ends of the string.
If nothing is specified for <chars>
, all combinations of whitespace code points will be removed.
If only left-side or right-side removal is wanted, <str>.lstrip(<chars>)
and <str>.rstrip(<chars>)
can be used.
# This will remove "https://", because it can be formed from "/stph:".
>>> 'https://unicode.org/emoji/'.strip('/stph:')
'unicode.org/emoji'
# Removal of all whitespace from both ends of the str.
>>> ' 🐪🐪🐪🌟🐪🐪🐪 '.strip()
'🐪🐪🐪🌟🐪🐪🐪'
>>> justification = 'оправдание'
>>> justification.strip('еина')
'оправд'
# Prefix and suffix in one step.
>>> 'unaddressed'.strip('dnue')
'address'
>>> ' unaddressed '.strip('dnue ')
'address'
<str>.replace(<substring>, <replacement substring>)
returns a copy of the string with all occurrences of <substring>
replaced with <replacement substring>
.
The quote used below is from The Hunting of the Snark by Lewis Carroll
# The Hunting of the Snark, by Lewis Carroll
>>> quote = '''
"Just the place for a Snark!" the Bellman cried,
As he landed his crew with care;
Supporting each man on the top of the tide
By a finger entwined in his hair.
"Just the place for a Snark! I have said it twice:
That alone should encourage the crew.
Just the place for a Snark! I have said it thrice:
What I tell you three times is true."
'''
>>> quote.replace('Snark', '🐲')
...
'\n"Just the place for a 🐲!" the Bellman cried,\n As he landed his crew with care;\nSupporting each man on the top of the tide\n By a finger entwined in his hair.\n\n"Just the place for a 🐲! I have said it twice:\n That alone should encourage the crew.\nJust the place for a 🐲! I have said it thrice:\n What I tell you three times is true."\n'
>>> 'bookkeeper'.replace('kk', 'k k')
'book keeper'
:star:Newly added in Python 3.9
Python 3.9
introduces two new string methods that make removing prefixes and suffixes much easier.
<str>.removeprefix(<substring>)
returns the string without the prefix (string[len(<substring>):]
). If the <substring>
isn't present, a copy of the original string will be returned.
# removing a prefix
>>> 'TestHook'.removeprefix('Test')
'Hook'
>>> 'bookkeeper'.removeprefix('book')
'keeper'
<str>.removesuffix(<substring>)
returns the string without the suffix (string[:-len(substring)]
). If the <substring>
isn't present, a copy of the original string will be returned.
# removing a suffix
>>> 'TestHook'.removesuffix('Hook')
'Test'
>>> 'bookkeeper'.removesuffix('keeper')
'book'
For more examples and methods the informal tutorial is a nice jumping-off point. How to Unicode in the Python docs offers great detail on Unicode, encoding, bytes, and other technical considerations for working with strings in Python.
Python also supports regular expressions via the re
module, which will be covered in a future exercise.