This post shows you how to use Regular expressions in Python.
Step 1 Import
There is no native support within Python for Regular expressions so the first step is to import the re module
import re
Step 2 Compile
The next step is to compile the Regular expression. The compile method accepts a Regular expression and returns a Regular expression object.
telephone_regex = re.compile(r'\d\d\d\d-\d\d\d-\d\d\d\d')
The Regular expression above matches a pattern where there are four digits then a hyphen followed by three digits, another hyphen and finally four digits. Enclosing the Regular expression with r ensures Python treats it as a raw string overriding the normal special handing of the backslash character.
There is some debate whether re.compile is required. The documentation for the compile method mentions this step isn’t necessary when your program only uses a few Regular expressions. If you would like to know more about the necessity of re.compile, this Stackoverflow question is a good place to start.
Step 3 First match or All matches?
The Regular expression will be used to match the telephone numbers within this string:
text_to_search = 'You can contact me on 0171-123-45678 or 0141-321-3691'
As there are two telephone numbers, do you want to match the first telephone number? or match all of them?
First match
The search method will return the first match. This method returns a match object which in this example will contain 0171-123-45678.
first_search_result = telephone_regex.search(text_to_search)
All matches
The findall method returns all matches. This method returns a list object which in this example will contain both telephones: 0171-123-45678 and 0141-321-3691
all_results = telephone_regex.findall(text_to_search)
What happens when nothing is found?
If the search method doesn’t find a match, the None type is returned instead of a match object. This can lead to an exception if you didn’t anticipate it. The example below illustrates this behavior.
# no telephone numbers within the string
text_to_search = 'You can contact me on or'
# No matches are found so first_search_result is a None type
# not a match object
first_search_result = telephone_regex.search(text_to_search)
# shows that first_search_result is a None type
print(type(first_search_result))
# The None type does not have method called group so this line
# will cause an exception
print(first_search_result.group())
In contrast to the search method, the findall method doesn’t raise an exception if it fails to find a match. Instead it returns an empty list.
Step 4 Viewing the results
One way to view the contents of the match object is to use the group method. In the example below the group method is combined with print to output the results of the search to the console.
print(first_search_result.group())
Console output:
0171-123-4567
The findall method returns a list and the print can be used to output the contents to console.
print(all_results)
Console output:
['0171-123-4567', '0141-321-3691']
Putting it all together
This Python module brings together the code shown throughout this post and can be used as a starting point for your own experiments.
import re
text_to_search = 'You can contact me on 0171-123-45678 or 0141-321-3691'
telephone_regex = re.compile(r'\d\d\d\d-\d\d\d-\d\d\d\d')
# finds 0171-123-45678
first_search_result = telephone_regex.search(text_to_search)
print(first_search_result.group())
# finds 0171-123-45678 and 0141-321-3691
all_results = telephone_regex.findall(text_to_search)
print(all_results)
Acknowledgements
Al Sweigart for his superb Udemy course Automate the Boring Stuff with Python