Python Regular expressions in four steps

This post shows you how to use Regular expressions in Python.

Step 1 Import

There is no native support within Python for Regular expressions so the first step is to import the re module

import re

Step 2 Compile

The next step is to compile the Regular expression. The compile method accepts a Regular expression and returns a Regular expression object.

telephone_regex = re.compile(r'\d\d\d\d-\d\d\d-\d\d\d\d')

The Regular expression above matches a pattern where there are four digits then a hyphen followed by three digits, another hyphen and finally four digits. Enclosing the Regular expression with r ensures Python treats it as a raw string overriding the normal special handing of the backslash character.

There is some debate whether re.compile is required. The documentation for the compile method mentions this step isn’t necessary when your program only uses a few Regular expressions. If you would like to know more about the necessity of re.compile, this Stackoverflow question is a good place to start.

Step 3 First match or All matches?

The Regular expression will be used to match the telephone numbers within this string:

text_to_search = 'You can contact me on 0171-123-45678 or 0141-321-3691'

As there are two telephone numbers, do you want to match the first telephone number? or match all of them?

First match

The search method will return the first match. This method returns a match object which in this example will contain 0171-123-45678.

first_search_result = telephone_regex.search(text_to_search)

All matches

The findall method returns all matches. This method returns a list object which in this example will contain both telephones: 0171-123-45678 and 0141-321-3691

all_results = telephone_regex.findall(text_to_search)

What happens when nothing is found?

If the search method doesn’t find a match, the None type is returned instead of a match object. This can lead to an exception if you didn’t anticipate it. The example below illustrates this behavior.

# no telephone numbers within the string
text_to_search = 'You can contact me on or'

# No matches are found so first_search_result is a None type 
# not a match object
first_search_result = telephone_regex.search(text_to_search)

# shows that first_search_result is a None type 
print(type(first_search_result))

# The None type does not have method called group so this line 
# will cause an exception
print(first_search_result.group())

In contrast to the search method, the findall method doesn’t raise an exception if it fails to find a match. Instead it returns an empty list.

Step 4 Viewing the results

One way to view the contents of the match object is to use the group method. In the example below the group method is combined with print to output the results of the search to the console.

print(first_search_result.group())

Console output:

0171-123-4567

The findall method returns a list and the print can be used to output the contents to console.

print(all_results)

Console output:

['0171-123-4567', '0141-321-3691']

Putting it all together

This Python module brings together the code shown throughout this post and can be used as a starting point for your own experiments.

import re

text_to_search = 'You can contact me on 0171-123-45678 or 0141-321-3691'

telephone_regex = re.compile(r'\d\d\d\d-\d\d\d-\d\d\d\d')

# finds 0171-123-45678
first_search_result = telephone_regex.search(text_to_search)

print(first_search_result.group())

# finds 0171-123-45678 and 0141-321-3691
all_results = telephone_regex.findall(text_to_search)

print(all_results)


Acknowledgements

Al Sweigart for his superb Udemy course Automate the Boring Stuff with Python

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.