Regular Expressions in Python

6 minute read

Example of a raw string

  • Using a raw string r before the string we can print out the full string



We want our Regular Expressions to interpret the strings we’re passing in and not have python doing anything to them first.


import re

text_to_search = '''

Ha HaHa

MetaCharacters (Need to be escaped):
. ^ $ * + ? { } [ ] \ | ( )


Mr. Powers
Mr Smith
Ms Davis
Mrs. Robinson
Mr. T

sentence = 'Start a sentence and then bring it to an end'

## Pass in a pattern ('abc')
pattern = re.compile(r'devin')

## NOw lets search through our text with this pattern

matches  = pattern.finditer(text_to_search)

for match in matches:


<re.Match object; span=(142, 147), match='devin'>

The span is the beginning and end index of the match

When we used the finditer function it found 1 match of devin and it found it in our text_to_search string from indexes 142 to 147.

Indexes are useful because it allows use to use the string slicing functionality in Python where can plug in these values and get the exact match.

match = text_to_search[142:147]



What happens if there is more than one stance of the pattern? For example if we passed in the string owers.


<re.Match object; span=(148, 153), match='owers'>
<re.Match object; span=(230, 235), match='owers'>

How do we deal with MetaCharacters?

pattern = re.compile(r'.')


<re.Match object; span=(1, 2), match='a'>
<re.Match object; span=(2, 3), match='b'>
<re.Match object; span=(3, 4), match='c'>
<re.Match object; span=(4, 5), match='d'>
<re.Match object; span=(5, 6), match='e'>

These are all literal periods from our string that we passed in!

We want to escape!

pattern = re.compile(r'devinpowers\.com')


<re.Match object; span=(142, 157), match=''>

We use regular expressions to find Patterns


  • Can use these to search!!!!!

. - Any Character Except New Line \d - Digit (0-9) \D - Not a Digit (0-9) \w - Word Character (a-z, A-Z, 0-9, _) \W - Not a Word Character \s - Whitespace (space, tab, newline) \S - Not Whitespace (space, tab, newline)

\b - Word Boundary \B - Not a Word Boundary ^ - Beginning of a String $ - End of a String

[] - Matches Characters in brackets [^ ] - Matches Characters NOT in brackets | - Either Or ( ) - Group


    • 0 or More
    • 1 or More ? - 0 or One {3} - Exact Number {3,4} - Range of Numbers (Minimum, Maximum)

Sample Regexs:



  • Find all the digits of this sentence
import re

sentence = 'Start23 a sentence69 and then bring it420 to an end'

pattern = re.compile(r'\d')

matches  = pattern.finditer(sentence)

for match in matches:


<re.Match object; span=(5, 6), match='2'>
<re.Match object; span=(6, 7), match='3'>
<re.Match object; span=(18, 19), match='6'>
<re.Match object; span=(19, 20), match='9'>
<re.Match object; span=(38, 39), match='4'>
<re.Match object; span=(39, 40), match='2'>
<re.Match object; span=(40, 41), match='0'>

We can combine a bunch of these snippets and search for things like a phone number.

Example of this:

pattern = re.compile(r'\d\d\d.\d\d\d.\d\d\d\d')


<re.Match object; span=(159, 171), match='321-555-4321'>
<re.Match object; span=(172, 184), match='123.555.1234'>
<re.Match object; span=(185, 197), match='123*555*1234'>
<re.Match object; span=(198, 210), match='800-555-1234'>
<re.Match object; span=(211, 223), match='900-555-1234'>

Another Example using a .txt file

  • Insert .txt file
import re

#pattern = re.compile(r'\d\d\d.\d\d\d.\d\d\d\d')
pattern = re.compile(r'\d{3}.\d{3}.\d{4}')

with open ('data.txt', 'r') as f:

    contents =
    matches = pattern.finditer(contents)

    for match in matches:


We can see it’s super easy to parse .txt files using re in Python!


  • Used to match several different patterns
pattern = re.compile(r'M(r|s|rs)\.?\s[A-Z]\w*')

matches  = pattern.finditer(text_to_search)

for match in matches:


<re.Match object; span=(225, 235), match='Mr. Powers'>
<re.Match object; span=(236, 244), match='Mr Smith'>
<re.Match object; span=(245, 253), match='Ms Davis'>
<re.Match object; span=(254, 267), match='Mrs. Robinson'>
<re.Match object; span=(268, 273), match='Mr. T'>

Writing Regular Expressions for Email Example

import re

emails = '''

pattern = re.compile(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')

matches = pattern.finditer(emails)

for match in matches:


<re.Match object; span=(1, 23), match=''>
<re.Match object; span=(24, 40), match=''>
<re.Match object; span=(41, 71), match=''>

Working with URLs

  • Writing a regular expression to read urls
import re

urls = '''

pattern = re.compile(r'https?://(www\.)?(\w+)(\.\w+)')

subbed_urls = pattern.sub(r'\2\3', urls)