Strings
Strings in programming are simply text, either individual characters, words, phrases, or complete sentences. They are one of the most common elements to use when programming, at least when it comes to interacting with the user. Because they are so common, they are a native data type within Python, meaning they have many powerful capabilities built-in. Unlike other languages, you don’t have to worry about creating these capabilities yourself. This is good because the built-in ones have been tested many times over and have been optimized for performance and stability.
Strings in Python are different than most other languages. First off, there are no char types, only single character strings. Strings also can't be changed in-place; a new string object is created whenever you want to make changes to it, such as concatenation. This simply means you have to be aware that you are not manipulating the string in memory; it doesn’t get changed or deleted as you work with it. You are simply creating a new string each time.
Here's a list of common string operations:
• s1 = ' ' : empty string
• s2 = "knight's" : double quotes
• block = """ - """ : triple-quoted block
• s1 + s2 : concatenate
• s2 * 3 : repeat
• s2[n] : index
• len(s2) : length
• "a %s parrot" %'dead' : string formatting
• for x in s2 : iteration
• 'm' in s2 : membership
Empty strings are written as two quotes with nothing in between. The quotes used can be either single or double; my preference is to use double quotes since you don't have to escape the single quote to use it in a string. That means you can write a statement like
“And then he said, ‘No way’ when I told him.”
If you want to use just one type of quote mark all the time, you have to use the backslash character to “escape” the desired quote marks so Python doesn’t think it’s at the end of the phrase, like this:
“And then he said, \”No way\” when I told him.”
Triple quoted blocks are for strings that span multiple lines, as shown last chapter. Python collects the entire text block into a single string with embedded newline characters. This is good for things like writing short paragraphs of text, e.g. instructions, or for formatting your source code for clarification.
Basic string operations
The "+" and "*" operators are overloaded in Python, letting you concatenate and repeat string objects, respectively. Overloading is just using the same operator to do multiple things, based on the situation where it’s used. For example, the “+” symbol can mean addition when two numbers are involved or, as in this case, combining strings.
Concatenation combines two (or more) strings into a new string object whereas repeat simply repeats a given string a given number of times. Here are some examples:
Generic Code Example:
>>> len('abc') # length: number items
3
>>> 'abc' + 'def' # concatenation: a new string
'abcdef'
>>> 'Ni!' * 4 # like "Ni!" + "Ni!" + ...
'Ni!Ni!Ni!Ni!'
You need to be aware that Python doesn't automatically change a number to a string, so writing 'spam' + 3 will give you an error. To implicitly tell Python that a number should be a string, simply tell it. This is similar to casting values in C/C++. It informs Python that the number is not an integer or floating point number but is, in reality, a text representation of the number. Just remember that you can no longer perform mathematical functions with it; it’s strictly text.
Generic Code Example:
>>> str(3) #converts number to string
Iteration in strings is a little different than in other languages. Rather than creating a loop to continually go through the string and print out each character, Python has a built-in type for iteration. Here's an example followed by an explanation:
Generic Code Example:
>>> myjob = "lumberjack"
>>> for c in myjob: print c, # step though items
...
l u m b e r j a c k
>>> "k" in myjob # 1 means true
1
Essentially what is happening is that Python is sequentially going through the variable myjob and printing each character that exists in the string. For statements will be covered in depth later in the book but for now just be aware that they are what you use to step through a range of values. As you can see they can be used for strings or, more often, numbers.
The second example is simply a comparison. Does the letter “k” exist in the value stored by myjob? If yes, then Python will return a numeric value of 1, indicating yes. If “k” didn’t exist, it would return a 0. This particular case is most often used in word processing applications, though you can probably think of other situations where it would be useful.
Indexing and slicing strings
Strings in Python are handled similar to arrays in C. Unlike C arrays, characters within a string can be accessed both front and backwards. Frontways, a string starts of with a position of 0 and the character desired is found via an offset value. However, you also can find this character by using a negative offset value from the end of the string. I won't go deeply into it, but here's a quick example:
Generic Code Example:
>>>S = "spam"
>>>S[0], S[-2] #indexing from the front and rear
('s', 'a')
Indexing is simply telling Python where a character can be found within the string. Like many other languages, Python starts counting at 0 instead of 1. So the first character’s index is 0, the second character’s index is 1, and so on. It’s the same counting backwards through the string, except that the last letter’s index is -1 instead of 0 (since 0 is already taken). Therefore, to index the final letter you would use -1, the second to the last letter is -2, etc. Knowing the index of a character is important for slicing.
Slicing a string is basically what it sounds like: by giving upper and lower index values, we can pull out just the characters we want. A great example of this is when processing an input file where each line is terminated with a newline character; just slice off the last character and process each line. You could also use it to process command-line arguments by "filtering" out the program name. Again, here's an example:
Generic Code Example:
>>> S = “spam”
>>> S[1:3], S[1:], S[:-1] # slicing: extract section
('pa', 'pam', 'spa')
You’ll notice that the colon symbol is used when slicing. The colon acts as a separator between the upper and lower index values. If one of those values is not given, Python interprets that to mean that you want everything from the index value to the end of the string. In the example above, the first slice is from index 1 (the second letter, inclusive) to index 3 (the 4th letter, exclusive). You can consider the index to actually be the space before each letter; that’s why the letter “m” isn’t included in the slice but the letter “p” is.
The second slice is from index 1 to the end of the string, which returns the everything, and including, the 2nd letter. The third slice starts at the end of the string and goes backwards.
String Formatting
Formatting strings is simply a way of presenting the information on the screen in a way that conveys the information best. Some examples of formatting are creating column headers, dynamically creating a sentence from a list or stored variable, or stripping extraneous information from the strings, such as excess spaces.
Python supports the creation of dynamic strings. What this means is that you can create a variable containing a value of some type (such as a string or number) then “call” that value into your string. You can process a string the same way as in C if you choose to, such as %d for integers and %f for floating point numbers. Here’s an example:
Generic Code Example:
>>> s = “parrot”
>>> d = 1
>>> print “That is %d dead %s!” % (d, s)
That is 1 dead parrot!
Python also has a string utility module for tools such as case conversion, converting strings to numbers, etc. Here's yet another example:
Generic Code Example:
>>> import string # standard utilities module
>>> S = "spammify"
>>> string.upper(s) # convert to uppercase
'SPAMMIFY'
>>> string.find(S, "mm") # return index of substring
3
>>> string.atoi("42"), `42` # convert from/to string
(42, '42')
>>> string.join(string.split(S, "mm"), "XX")
'spaXXify'
Notice the example of the second to last line. Backquotes are used to convert an object into a string. This is one way around the "don't mix strings and numbers" problem from earlier. I'll leave the last line example above as a mental test. See if you can figure out what the statement is doing.
Though it’s not strictly a string operation (it can be used with just about anything that can be measure), the len() method can be used to give you the length of a string. For example,
>>>string = “The Life of Brian”
>>>print len(string)
17
>>>len(“The Meaning of Life”)
19
As shown in the second example above, you don’t necessarily have to use a print statement (or print function in Python 3) to display a value. Simply writing what you want will print out the result. However, this doesn’t always work in your favor. Sometimes the object will only return a memory address, as we will see later in the book. Generally speaking, it’s simply easier to explicitly state “print” if you want a statement evaluated and printed out. Otherwise you don’t know exactly what value it will return.
Combining and Separating Strings
Strings can be combined (joined) and separated (split) quite easily. Tokenization is the process of splitting something up into individual tokens; in this case, a sentence is split into individual words. When a web page is parsed by a browser, the HTML, Javascript, and any other code in the page is tokenized and identified as a keyword, operator, variable, etc. The browser then uses this information to display the web page correctly, or at least as well as it can.
Python does much the same thing (though with better results). The Python interpreter tokenizes the source code and identifies the parts that are part of the actual programming language and the parts that are data. The individual tokens are separated by delimiters, characters that actually separate one token from another.
In strings, the main delimiter is a whitespace character, such as a tab, a newline, or an actual space. These delimiters mark off individual characters or words, sentences, and paragraphs. When special formatting is needed, other delimiters can be specified by the programmer.
Joining strings combines the separate strings into one string. Because string operations always create a new string, you don’t have to worry about the original strings being overwritten. The catch is that it doesn’t concatenate the strings, i.e. joining doesn’t combine them like you would expect. Here’s an example:
>>>string1 = “1 2 3”
>>>string2= “A B C”
>>>string3 = string2.join(string1)
>>>print string3
1A B C A B C2A B C A B C3
As you can see, the results are not what you expect. However, when creating a complex string, it can be better to put the pieces into a list and then simply join them, rather than trying to concatenate them.
Speaking of concatenation, I will briefly talk about it. Concatenation combines two or more strings into a new, complete string. This is probably what you were thinking when I talked about joining strings together.
>>>string1 + string2
'1 2 3A B C'
>>>”Navy gravy. ” + “The finest gravy in the Navy.”
Navy gravy. The finest gravy in the Navy.
Chances are you will use concatenation more often than joining. To me, it simply makes more sense than messing with join(). But, with practice, you may find joining to be easier or more efficient.
Finally, splitting strings separates them into their component parts. The result is a list containing the individual words or characters. Here are some examples:
>>>string = “My wife hates spam.”
>>>string.split() #split string at spaces
['My', 'wife', 'hates', 'spam.']
>>>new_string = “1, 2, 3”
>>>new_string.split(“,”) #split string at commas
['1', ' 2', ' 3']
As we move further into the Python language, we will look at these and other features of strings. Console programs will benefit most from learning how to use strings, however, word processors are obviously another place where knowing how to manipulate strings will come in handy.
A handy reference of the most common string methods can be found in the appendix of this book.