1 of 143

Python – Strings and Regular expression

2 of 143

Python - Strings

  • Strings are amongst the most popular types in Python. We can create them simply by enclosing characters in quotes. Python treats single quotes the same as double quotes.
  • Creating strings is as simple as assigning a value to a variable. For example:

var1 = 'Hello World!'

var2 = "Python Programming"

3 of 143

Strings�

  • Strings in python are surrounded by either single quotation marks, or double quotation marks.
  • 'hello' is the same as "hello".
  • You can display a string literal with the print() function:

print("Hello")� print('Hello')

Output:

Hello� Hello

4 of 143

Assign String to a Variable�

  • Assigning a string to a variable is done with the variable name followed by an equal sign and the string:

a = "Hello"� print(a)

Output: Hello

5 of 143

Multiline Strings�

  • You can assign a multiline string to a variable by using three quotes:
  • You can use three double quotes Or three single quotes

a =  """ Welcome to Kongu College,� Welcome civil engineers,� civil students are good students."""� print(a)

Output:

Welcome to Kongu College,� Welcome civil engineers,� civil students are good students.

6 of 143

String Concatenation�

  • To concatenate, or combine, two strings you can use the + operator.

Example

  • Merge variable a with variable b into variable c:

a = "Hello"� b = "World"� c = a + b� print(c)

Output: HelloWorld

7 of 143

Example�

  • To add a space between them, add a " ":

a = "Hello"� b = "World"� c = a + " " + b� print(c)

Output: Hello World

8 of 143

Multiply on Strings

  • Create multiple copies of a string in Python by using multiplication operator (*)
  • Simply using multiplication operator on the string to be copied with the required number of times it should be copied.

str2 = str1 * N

  • where str2 is the new string where you want to store the new string
  • str1 is the original string
  • N is the number of the times you want to copy the string.
  • After using multiplication operator we get a string as output

9 of 143

# Original string

a = "Geeks"

# Multiply the string and store

# it in a new string

b = a*3

# Display the strings

print(a)

print(b)

Output:

Geeks

GeeksGeeksGeeks

10 of 143

Initializing the original string

# Original string

a = "Geeks“

N=3

# Multiply the string and store

# it in a new string

b = a* N

# Display the strings

print(a)

print(b)

Output:

Geeks

GeeksGeeksGeeks

11 of 143

Copying a string multiple times given in a list�

  • If we have a string as a list element, and we use the multiplication operator on the list we will get a new list that contains the same element copied specified number of times.

a  = [“str1”] * N

a will be a list that contains str1 N number of times.

12 of 143

# Initialize the list

a =[ "Geeks”]

# No.of Copies

N=3

# Multiply the string and store

# it in a new string

b = a* N

# Display the strings

print(a)

print(b)

Output:

[‘Geeks’]

[‘GeeksGeeksGeeks’]

13 of 143

Accessing Values in Strings:

  • Python does not support a character type; these are treated as strings of length one, thus also considered a substring.
  • To access substrings, use the square brackets for slicing along with the index or indices to obtain your substring:
  • Example:

var1 = 'Hello World!'

var2 = "Python Programming"

print "var1[0]: ", var1[0]

print "var2[1:5]: ", var2[1:5]

This will produce following result:

var1[0]: H

var2[1:5]: ytho

14 of 143

Updating Strings:

  • can "update" an existing string by (re)assigning a variable to another string. The new value can be related to its previous value or to a completely different string altogether.
  • Example:

var1 = 'Hello World!'

print ("Updated String :- ", var1[:6] + 'Python‘)

This will produce following result:

Updated String :- Hello Python

15 of 143

1. str= “Hello”

print(str*3)

Output:

HelloHelloHello

2. str1=“Hello”

var=7

str2=str1+var

print(str2)

Output: Error

Cannot concatenate ‘str’ and ‘int’ objects

16 of 143

3. str1=“Hello”

var=7

str2=str1+ str(var)

print(str2)

Output:

Hello7

17 of 143

Escape Characters:

Backslash

Hexadecimal

Description

notation

character

\a

0x07

Bell or alert

\b

0x08

Backspace

\cx

 

Control-x

\C-x

 

Control-x

\e

0x1b

Escape

\f

0x0c

Formfeed

\M-\C-x

 

Meta-Control-x

\n

0x0a

Newline

\nnn

 

Octal notation, where n is in the range 0.7

\r

0x0d

Carriage return

\s

0x20

Space

\t

0x09

Tab

\v

0x0b

Vertical tab

\x

 

Character x

\xnn

 

Hexadecimal notation, where n is in the range 0.9, a.f, or A.F

18 of 143

String Special Operators: Assume string variable a holds 'Hello' and variable b holds 'Python' then:

Operator

Description

Example

+

Concatenation - Adds values on either side of the operator

a + b will give HelloPython

*

Repetition - Creates new strings, concatenating multiple copies of the same string

a*2 will give -HelloHello

[]

Slice - Gives the character from the given index

a[1] will give e

[ : ]

Range Slice - Gives the characters from the given range

a[1:4] will give ell

in

Membership - Returns true if a character exists in the given string

H in a will give 1

not in

Membership - Returns true if a character does not exist in the given string

M not in a will give 1

r/R

Raw String - Suppress actual meaning of Escape characters.

print r'\n' prints \n and print R'\n' prints \n

%

Format - Performs String formatting

See at next section

19 of 143

String Formatting Operator:

Format Symbol

Conversion

%c

character

%s

string conversion via str() prior to formatting

%i

signed decimal integer

%d

signed decimal integer

%u

unsigned decimal integer

%o

octal integer

%x

hexadecimal integer (lowercase letters)

%X

hexadecimal integer (UPPERcase letters)

%e

exponential notation (with lowercase 'e')

%E

exponential notation (with UPPERcase 'E')

%f

floating point real number

%g

the shorter of %f and %e

%G

the shorter of %f and %E

20 of 143

Formatting Strings

21 of 143

Other supported symbols and functionality are listed in the following table:

Symbol

Functionality

*

argument specifies width or precision

-

left justification

+

display the sign

<sp>

leave a blank space before a positive number

#

add the octal leading zero ( '0' ) or hexadecimal leading '0x' or '0X', depending on whether 'x' or 'X' were used.

0

pad from left with zeros (instead of spaces)

%

'%%' leaves you with a single literal '%'

(var)

mapping variable (dictionary arguments)

m.n.

m is the minimum total width and n is the number of digits to display after the decimal point (if appl.)

22 of 143

Triple Quotes:

  • Python's triple quotes comes to the rescue by allowing strings to span multiple lines, including verbatim NEWLINEs, TABs, and any other special characters.
  • The syntax for triple quotes consists of three consecutive single or double quotes.

para_str = """this is a long string that is made up of several lines and non-printable characters such as TAB ( \t ) and they will show up that way when displayed. NEWLINEs within the string, whether explicitly given like this within the brackets [ \n ], or just a NEWLINE within the variable assignment will also show up. """

print para_str;

23 of 143

Raw String:

  • Raw strings don't treat the backslash as a special character at all. Every character you put into a raw string stays the way you wrote it:

print 'C:\\nowhere'

This would print following result:

C:\nowhere

Now let's make use of raw string. We would put expression in r'expression' as follows:

print r'C:\\nowhere'

This would print following result:

C:\\nowhere

24 of 143

Unicode String:

  • Normal strings in Python are stored internally as 8-bit ASCII, while Unicode strings are stored as 16-bit Unicode. This allows for a more varied set of characters, including special characters from most languages in the world.

print u'Hello, world!'

This would print following result:

Hello, world!

25 of 143

Built-in String Methods:

1

Capitalizes first letter of string

2

Returns a space-padded string with the original string centered to a total of width columns

3

Counts how many times str occurs in string, or in a substring of string if starting index beg and ending index end are given

4

Decodes the string using the codec registered for encoding. encoding defaults to the default string encoding.

5

Returns encoded string version of string; on error, default is to raise a ValueError unless errors is given with 'ignore' or 'replace'.

6

7

Determines if string or a substring of string (if starting index beg and ending index end are given) ends with suffix; Returns true if so, and false otherwise

Determines if string or a substring of string (if starting index beg and ending index end are given) starts with substring str; Returns true if so, and false otherwise

8

Expands tabs in string to multiple spaces; defaults to 8 spaces per tab if tabsize not provided

26 of 143

9

Determine if str occurs in string, or in a substring of string if starting index beg and ending index end are given; returns index if found and -1 otherwise

10

Same as find(), but search backwards in string

11

Same as find(), but raises an exception if str not found

12

Same as index(), but search backwards in string

13

Returns true if string has at least 1 character and all characters are alphanumeric and false otherwise

14

Returns true if string has at least 1 character and all characters are alphabetic and false otherwise

15

Returns true if string contains only digits and false otherwise

27 of 143

16

Returns true if string has at least 1 cased character and all cased characters are in lowercase and false otherwise

17

Returns true if string has at least one cased character and all cased characters are in uppercase and false otherwise

18

Returns true if a unicode string contains only numeric characters and false otherwise

19

Returns true if string contains only whitespace characters and false otherwise

20

Returns true if string is properly "titlecased" and false otherwise

21

Merges (concatenates) the string representations of elements in sequence seq into a string, with separator string

22

Returns the length of the string

23

Returns a space-padded string with the original string left-justified to a total of width columns

28 of 143

24

Returns a space-padded string with the original string right-justified to a total of width columns.

25

Converts all uppercase letters in string to lowercase

Converts lowercase letters in string to uppercase

26

27

Removes all leading whitespace in string

Removes all trailing whitespace of string

28

Performs both lstrip() and rstrip() on string

29

30

Returns a translation table to be used in translate function.

Translates string according to translation table str(256 chars), removing those in the del string

31

Returns the max alphabetical character from the string str

29 of 143

32

Returns the min alphabetical character from the string str

33

Replaces all occurrences of old in string with new, or at most max occurrences if max given

34

Splits string according to delimiter str (space if not provided) and returns list of substrings; split into at most num substrings if given

35

Splits string at all (or num) NEWLINEs and returns a list of each line with NEWLINEs removed

36

Inverts case for all letters in string

37

Returns "titlecased" version of string, that is, all words begin with uppercase, and the rest are lowercase

30 of 143

38

Returns original string leftpadded with zeros to a total of width characters; intended for numbers, zfill() retains any sign given (less one zero)

39

Returns true if a unicode string contains only decimal characters and false otherwise

31 of 143

Built-in string methods�

capitalize()

str="kongu engineering college "# first letter is converted into capital

str1=str.capitalize()

print(str, "  ", str1)

Output:kongu engineering college Kongu engineering college

#Title():

str2=str.title()# first letter of all words are changed to capital letter

print(str,"  ", str2)

kongu engineering college Kongu Engineering College

32 of 143

# center()�

str="kongu"

str1=str.center(11)

print(str)

print(str1)

str1=str.center(11,"*")

print(str)

print(str1)

Output : kongu

kongu

kongu

***kongu***

33 of 143

# count()--Counts how many times str occurs in string,�

str="kongu cse eee ece eee"

sub_str="eee"

cnt=str.count(sub_str) 

#(or) #cnt=str.count("eee") 

print(cnt)

cnt=str.count("civil")

print(cnt)

cnt=str.count("eee",5,15) # 5 starting index and 15 ending index

print(cnt)

cnt=str.count("eee",15)

print(cnt)

Output: 2 0 1 1

34 of 143

# endswith() and startswith()�

str="kongu engineering college"

print(str.endswith("ege"))

print(str.endswith("ege",2,12))

print(str.endswith("ing"))

print(str.startswith("kon",2,15))

Output:

True

False

False

False

35 of 143

# expandtabs�

str=" kongu\tengineering\tcollege"

print(str.expandtabs())

str=" kongu\t\tengineering\t\tcollege"

print(str.expandtabs())

Output:

kongu engineering college

kongu engineering college

36 of 143

# find()

str="kongu engineering college"

ind=str.find("engineering")

print(ind)

ind1=str.find("engineering",4,10)

print(ind1)

ind1=str.find("engineering",4,20)

print(ind1)

Output: 6,-1,6

37 of 143

#rfind()-- Same as find(), but search backwards in string

str=" kongu engineering college"

ind=str.rfind("engineering")

print(ind)

ind=str.rfind("college")

print(ind)

ind=str.rfind("college",15)

print(ind)

Output: 7 19 19

38 of 143

# index()�

str="kongu engineering college"

ind=str.index("engineering")

print(ind)

ind=str.index("engineering",0,25)

print(ind)

Output:

6

6

39 of 143

#rindex() -->search from backwards�

ind=str.rindex("engineering",0,25)

print(ind)

ind=str.rindex("engineering",3,25)

print(ind)

Output:

6

6

40 of 143

#alnum() --> alphanumeric�

  • isalnum() : Returns true if string has at least 1 character and all characters are alphanumeric and false otherwise

str1="kongu"

print(str1.isalnum())

str1="kongu123"

print(str1.isalnum())

str2="***$$$"

print(str2.isalnum())

Output: True True False

41 of 143

#isalpha()�

  • isalpha() : Returns true if string has at least 1 character and all characters are alphabetic and false otherwise

str1="kongu123"

print(str1.isalpha())

str1="kongu"

print(str1.isalpha())

str1="123"

print(str1.isalpha())

Output : False True False

42 of 143

#isdigit()�

  • isdigit() - Returns true if string contains only digits and false otherwise

str1="123"

print(str1.isdigit())

str1="kec123"

print(str1.isdigit())

Output : True

False

43 of 143

�# islower() and isupper()�

islower() - Returns true if string has at least 1 cased character and all cased characters are in lowercase and false otherwise

isupper() -Returns true if string has at least one cased character and all cased characters are in uppercase and false otherwise

# islower() and isupper()

str="kec"

print(str.islower())

str="Kec"

print(str.islower())

str1="KEC"

print(str1.isupper())

str1="KEc"

print(str1.isupper())

Output : True False True False

44 of 143

�� #isnumeric() and # isspace()��

isnumeric() - Returns true if a unicode string contains only numeric characters and false otherwise

isspace() - Returns true if string contains only whitespace characters and false otherwise

#isnumeric()

str="123"

print(str.isnumeric())

str="kec123"

print(str.isnumeric())

# isspace()

str=" kongu college"

print(str.isspace())

str="   "

print(str.isspace())

Output : True False False True

45 of 143

�#istitle()�

istitle()

Returns true if string is properly "titlecased" and false otherwise

#istitle()

str="Kongu Enigneering College"

print(str.istitle())

str="Kongu enigneering College"

print(str.istitle())

Output : True False

46 of 143

�#join�

join(seq) - Merges (concatenates) the string representations of elements in sequence into a string, with separator string

#The join() method takes all items in an iterable(list, tuple,string) and joins them 

into one string.

l1=["1","2","3"]

str1="kec"

new_str=str1.join(l1)

print(new_str)

l2=["kongu","kec"]

str2="college"

new_str1=str2.join(l2)

print(new_str1)

Output: 1kec2kec3

kongucollegekec��

47 of 143

myTuple = ("John", "Peter", "Vicky")

x = "#".join(myTuple)

# (or)

str="#"

x=str.join(myTuple)print(x)

Output : John#Peter#Vicky

48 of 143

# len()�

len(string)

Returns the length of the string

str="electrical"

print(len(str))

Output : 10

49 of 143

ljust(width,[fillchar]) -Returns a space-padded string with the original string left-justified to a total of width columns

rjust(width,[ fillchar]) Returns a space-padded string with the original string right-justified to a total of width columns.

str="kongu"

print(str.ljust(11)," welcome")

print(str.ljust(10,'_'),"hello")

str="kongu"

print(str.rjust(50)," welcome")

print(str.rjust(50,'_')," hello")

Output : kongu welcome

kongu_____ hello

kongu welcome _____________________________________________kongu hello

50 of 143

lower() - Converts all uppercase letters in string to lowercase

#lower()

str="kec"

lower_str=str.lower()

print(str, "  ", lower_str)

str="Kec"

lower_str=str.lower()

print(str, "  ", lower_str)

str="KEC"

lower_str=str.lower()

print(str, "  ", lower_str)

Output : kec kec Kec kec KEC kec

51 of 143

#upper()�

upper() - Converts lowercase letters in string to uppercase

#upper()

str="kec"

upper_str=str.upper()

print(str, "  ", upper_str)

str="Kec"

upper_str=str.upper()

print(str, "  ", upper_str)

str="KEC"

upper_str=str.upper()

print(str, "  ", upper_str)

Output : kec KEC Kec KEC KEC KEC

52 of 143

#lstrip()�

lstrip() - Removes all leading whitespace in string

#lstrip()

str="          kongu"

print(str,"welcome")

print(str.lstrip(),"welcome")

Output :

kongu welcome

kongu welcome

53 of 143

#rstrip()�

rstrip() - Removes all trailing whitespace of string

str="kongu     "

print(str,"  ", " welcome")

print(str.rstrip(),"welcome")

Output:

kongu welcome

kongu welcome

54 of 143

#strip()

strip([chars]) - Performs both lstrip() and rstrip() on string

str="       kongu         "

# output:kongu ( no space both on left and right)

print(" hello",str,"welcome")

#new_str=str.rstrip().lstrip()

#print("hello",new_str,"welcome")

print("hello",str.rstrip().lstrip(),"welcome")

Output: hello kongu welcome

hello kongu welcome

55 of 143

  • maketrans() - Returns a translation table to be used in translate function.

translate(table, deletechars="") - Translates string according to translation table str(256 chars), removing those in the del string

# maketrans() and translate()

intab = "aeiou"

outtab = "12345"

str = "this is string example....wow!!!"

trantab = str.maketrans(intab, outtab)

print(trantab)

print (str.translate(trantab))

Output: {97: 49, 101: 50, 105: 51, 111: 52, 117: 53}

th3s 3s str3ng 2x1mpl2....w4w!!!

56 of 143

#max() and min()�

max(str) - Returns the max alphabetical character from the string

min(str) - Returns the min alphabetical character from the string

str="kongu"

print(max(str))

print(min(str))

str="kongu123"

print(max(str))

print(min(str))

Output : u g u 1

57 of 143

replace(old, new, [max]) - Replaces all occurrences of old in string with new, or at most max occurrences if max given

#replace()

str=" kec eee cse kec cse kec"

str1=str.replace("kec", "kongu")

print(str1)

str1=str.replace("kec", "kongu",1)

print(str1)

str1=str.replace("kec", "kongu",2)

print(str1)

Output: kongu eee cse kongu cse kongu

kongu eee cse kec cse kec

kongu eee cse kongu cse kec�

58 of 143

split(str="", num=string.count(str)) - Splits string according to delimiter str (space if not provided) and returns list of substrings; split into at most num substrings if given

  • splitlines( num=string.count('\n')) - Splits string at all (or num) NEWLINEs and returns a list of each line with NEWLINEs removed

59 of 143

#split and splitlines

# split() will create a list of substrings

str="kec eee cse kec cse kec"

sub_strings=str.split()# delimiter is space

print(sub_strings)

print(type(sub_strings))

split_str="kec"

sub_strings=str.split(split_str)# here the delimiter is kec

print(sub_strings)

str="keceeecsekeccsekec"

sub_strings=str.split(split_str)# here the delimiter is kec

print(sub_strings)

sub_strings=str.split(split_str,2)

print(sub_strings)

sub_strings=str.split(split_str,str.count(split_str))#sub_strings=str.split(split_str,3))

print(sub_strings)

60 of 143

Output

['kec', 'eee', 'cse', 'kec', 'cse', 'kec']

<class 'list'> ['', ' eee cse ', ' cse ', '']

['', 'eeecse', 'cse', '']

['', 'eeecse', 'csekec']

['', 'eeecse', 'cse', '']

61 of 143

#splitlines�

lines='''abc

def

ghi

jkl

mno'''

print(lines)

lines_1=lines.splitlines()

print(lines_1)

62 of 143

#splitlines output�

abc

def

ghi

jkl

mno

['abc', 'def', 'ghi', 'jkl', 'mno']

63 of 143

#swapcase()�

swapcase() - Inverts case for all letters in string.

str="KEC college"

print(str.swapcase())

str="Kec coLLege"

print(str.swapcase())

Output: kec COLLEGE

kEC COllEGE

64 of 143

#zfill -> zerofill�

str="kongu"

print(str.zfill(10))

str="123"

print(str.zfill(10))

Output:

00000kongu

0000000123

65 of 143

66 of 143

Negative index

67 of 143

# negative index

str="kongu"

print(str[-1])

print(str[-2])

print(str[0:2])

print(str[0:4])

print(str[0:4:1])

print(str[0:4:2])

print(str[0:-2])# will from 1 to -3 index

print(str[::-1])

print(str[-3::-1])

print("string",str[-4:-1:-1])# no answer

68 of 143

# negative index Output

u

g

ko

kong

Kong

kn

kon

ugnok

nok

string

69 of 143

Stride during slicing

Reverse skipping 3rd char

70 of 143

In and not in

Ord() and chr()

71 of 143

#ord() (ordinal()) and chr()�

# ascii values from 0 to 255

print(ord('a'))

print(ord('b'))

print(ord('A'))

print(ord('B'))

print(chr(97))

Output: 97

98

65

66

a

72 of 143

# in and not in�

str="kongu engg college"

if "kec" in str:

  print("present")

else:

  print("not present")

if "kong" in str:

  print("present")

else:

  print("not present")

if "k" in str:

  print("present")

else:

  print("not present")

# not in

if "k" not in str:

  print("present")

else:

  print("not present")

73 of 143

�# in and not in output�

not present

present

present

not present

74 of 143

75 of 143

iteration

76 of 143

Vaidate PAN no

Pattern

77 of 143

78 of 143

79 of 143

Help() in python

  • The python help function is used to display the documentation of  modules, functions, classes, keywords etc. 
  • Eg

  • Output is

80 of 143

The string module

81 of 143

The string module

  • Python String module contains some constants, utility function, and classes for string manipulation.
  • Eg.

  • output:

82 of 143

String module capwords()

print(string.capwords("kec"))

Output is: Kec

83 of 143

String module

  • To see the contents of string module
  • dir(string)

84 of 143

String module

  • To know the details of a particular item, use type()

85 of 143

Regular Expression�- Special sequence of characters that helps to match or find strings in another string

Match()- returns true only it is present in the beginning

86 of 143

Search()

Sub()

87 of 143

findall()

88 of 143

finditer()- returns an iterator. Used to print index of match in the given string

89 of 143

Flag options

90 of 143

Meta characters in RE

91 of 143

92 of 143

Check if string has atleast one vowel

Use of metacharacter * and +

93 of 143

94 of 143

Groups

95 of 143

Capturing groups - have the format(?P<name>…) where name is name of the group��Non Capturing groups - having the format(?:…)and not accessible by the group method , so they can be added to an existing regular expression without breaking the numbering

96 of 143

97 of 143

Application of Regular Expression to extract email

98 of 143

99 of 143

100 of 143

101 of 143

Python Additional �Regular expressions

102 of 143

Regular Expressions

  • Special sequence of characters to match or find other strings or set of strings using a specialized syntax held in a pattern
  • Widely used in languages like UNIX, PHP, Perl
  • Module re provides support for regular expressions in python
  • Re module raises re.error exception if an error occurs while compiling or using a regular expression

103 of 143

Regular Expressions

  • Regular expressions are a powerful string manipulation tool
  • All modern languages have similar library packages for regular expressions
  • Use regular expressions to:
    • Search a string (search and match)
    • Replace parts of a string (sub)
    • Break strings into smaller pieces (split)

104 of 143

105 of 143

106 of 143

Python’s Regular Expression Syntax

  • Most characters match themselves

The regular expression “test” matches the string ‘test’, and only that string

  • [x] matches any one of a list of characters

“[abc]” matches ‘a’,‘b’,or ‘c’

  • [^x] matches any one character that is not included in x

“[^abc]” matches any single character except ‘a’,’b’,or ‘c’

107 of 143

Python’s Regular Expression Syntax

  • “.” matches any single character
  • Parentheses can be used for grouping

“(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc.

  • x|y matches x or y

“this|that” matches ‘this’ and ‘that’, but not ‘thisthat’.

108 of 143

Python’sRegular Expression Syntax

  • x* matches zero or more x’s

“a*” matches ’’, ’a’, ’aa’, etc.

  • x+ matches one or more x’s

“a+” matches ’a’,’aa’,’aaa’, etc.

  • x? matches zero or one x’s

“a?” matches ’’ or ’a’

  • x{m, n} matches i x‘s, where m<i< n

“a{2,3}” matches ’aa’ or ’aaa’

109 of 143

Regular Expression Syntax

  • “\d” matches any digit; “\D” any non-digit
  • “\s” matches any whitespace character; “\S” any non-whitespace character
  • “\w” matches any alphanumeric character; “\W” any non-alphanumeric character
  • “^” matches the beginning of the string; “$” the end of the string
  • “\b” matches a word boundary; “\B” matches a character that is not a word boundary

110 of 143

Search and Match

  • The two basic functions are re.search and re.match
    • Search looks for a pattern anywhere in a string
    • Match looks for a match starting at the beginning
  • Both return None (logical false) if the pattern isn’t found and a “match object” instance if it is

>>> import re

>>> pat = "a*b”

>>> re.search(pat,"fooaaabcde")

<_sre.SRE_Match object at 0x809c0>

>>> re.match(pat,"fooaaabcde")

>>>

111 of 143

Q: What’s a match object?

  • A: an instance of the match class with the details of the match result

>>> r1 = re.search("a*b","fooaaabcde")

>>> r1.group() # group returns string matched

'aaab'

>>> r1.start() # index of the match start

3

>>> r1.end() # index of the match end

7

>>> r1.span() # tuple of (start, end)

(3, 7)

112 of 143

What got matched?

  • Here’s a pattern to match simple email addresses

\w+@(\w+\.)+(com|org|net|edu)

>>> pat1 = "\w+@(\w+\.)+(com|org|net|edu)"

>>> r1 = re.match(pat,"finin@cs.umbc.edu")

>>> r1.group()

'finin@cs.umbc.edu’

  • We might want to extract the pattern parts, like the email name and host

113 of 143

What got matched?

  • We can put parentheses around groups we want to be able to reference

>>> pat2 = "(\w+)@((\w+\.)+(com|org|net|edu))"

>>> r2 = re.match(pat2,"finin@cs.umbc.edu")

>>> r2.group(1)

'finin'

>>> r2.group(2)

'cs.umbc.edu'

>>> r2.groups()

r2.groups()

('finin', 'cs.umbc.edu', 'umbc.', 'edu’)

  • Note that the ‘groups’ are numbered in a preorder traversal of the forest

114 of 143

What got matched?

  • We can ‘label’ the groups as well…

>>> pat3 ="(?P<name>\w+)@(?P<host>(\w+\.)+(com|org|net|edu))"

>>> r3 = re.match(pat3,"finin@cs.umbc.edu")

>>> r3.group('name')

'finin'

>>> r3.group('host')

'cs.umbc.edu’

  • And reference the matching parts by the labels

115 of 143

More re functions

  • re.split() is like split but can use patterns

>>> re.split("\W+", “This... is a test,

short and sweet, of split().”)

['This', 'is', 'a', 'test', 'short’,

'and', 'sweet', 'of', 'split’, ‘’]

  • re.sub substitutes one string for a pattern

>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')

'black socks and black shoes’

  • re.findall() finds all matches

>>> re.findall("\d+”,"12 dogs,11 cats, 1 egg")

['12', '11', ’1’]

116 of 143

Compiling regular expressions

  • If you plan to use a re pattern more than once, compile it to a re object
  • Python produces a special data structure that speeds up matching

>>> capt3 = re.compile(pat3)

>>> cpat3

<_sre.SRE_Pattern object at 0x2d9c0>

>>> r3 = cpat3.search("finin@cs.umbc.edu")

>>> r3

<_sre.SRE_Match object at 0x895a0>

>>> r3.group()

'finin@cs.umbc.edu'

117 of 143

Pattern object methods

Pattern objects have methods that parallel the re functions (e.g., match, search, split, findall, sub), e.g.:

>>> p1 = re.compile("\w+@\w+\.+com|org|net|edu")

>>> p1.match("steve@apple.com").group(0)

'steve@apple.com'

>>> p1.search(”Email steve@apple.com today.").group(0)

'steve@apple.com’

>>> p1.findall("Email steve@apple.com and bill@msft.com now.")

['steve@apple.com', 'bill@msft.com’]

>>> p2 = re.compile("[.?!]+\s+")

>>> p2.split("Tired? Go to bed! Now!! ")

['Tired', 'Go to bed', 'Now', ’ ']

email address

sentence boundary

118 of 143

Example: pig latin

  • Rules
    • If word starts with consonant(s)
      • Move them to the end, append “ay”
    • Else word starts with vowel(s)
      • Keep as is, but add “zay”
    • How might we do this?

119 of 143

The pattern

([bcdfghjklmnpqrstvwxyz]+)(\w+)

120 of 143

piglatin.py

import re

pat = ‘([bcdfghjklmnpqrstvwxyz]+)(\w+)’

cpat = re.compile(pat)

def piglatin(string):

return " ".join( [piglatin1(w) for w in string.split()] )

121 of 143

piglatin.py

def piglatin1(word):

"""Returns the pig latin form of a word. e.g.:� piglatin1("dog”) => "ogday". """

match = cpat.match(word)

if match:

consonants = match.group(1)

rest = match.group(2)

return rest + consonants + “ay”

else:

return word + "zay“

print (piglatin())

122 of 143

123 of 143

124 of 143

125 of 143

126 of 143

127 of 143

128 of 143

void add();

Void main()

{

-----

add()

}

Void add()

{

int a=4,b=9;

printf(“%d”, a+b);

}

129 of 143

Date & Time

  • Python's time and calendar modules help track dates and times
  • time module provides functions for working with times and for converting between representations
  • Function time.time() returns the current system time in ticks since 12:00am, January 1, 1970

130 of 143

TimeTuple

131 of 143

Current time

132 of 143

Getting formatted time

133 of 143

134 of 143

  • strftime() provided by time module converts a struct_time object to a string

135 of 143

136 of 143

137 of 143

138 of 143

139 of 143

140 of 143

141 of 143

142 of 143

Getting calendar for a month

  • The calendar module gives a wide range of methods to manipulate with yearly and monthly calendars

calendar.isleap()

143 of 143

Time -clock() Method

  • clock() returns the current processor time as a floating point number expressed in seconds
  • Example:

import time;

print (time.clock())

time.sleep(20.5)

print (time.clock())