You have identified a gap in the social media market for very very short posts. Now that Twitter allows 280 character posts, people wanting quick social media updates aren't being served. You decide to create your own social media network.
To make your product noteworthy, you make it extreme and only allow posts of 5 or less characters. Any posts of more than 5 characters should be truncated to 5.
To allow your users to express themselves fully, you allow Emoji and other Unicode.
The task is to truncate input strings to 5 characters.
Text stored digitally has to be converted to a series of bytes. There are 3 ways to map characters to bytes in common use.
UTF-8 and UTF-16 are both Unicode encodings which means they're capable of representing a massive range of characters including:
UTF-8 and UTF-16 are both variable length encodings, which means that different characters take up different amounts of space.
Consider the letter 'a' and the emoji 'π'. In UTF-16 the letter takes 2 bytes but the emoji takes 4 bytes.
The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits.
Sign up to Exercism to learn and master Perl with 5 concepts, 66 exercises, and real human mentoring, all for free.