Learn about regular expressions in Python. We introduce the regular expression processing functions commonly used in Python.

Regular Expressions may be used to test if a string (Text) is according to a certain grammar. They are used to test strict grammar rules on a string.

Related Course:
Complete Python Programming Course & Exercises

Sometimes Regular Expressions are called regex or regexp. We can easily test if a string contains another string:

 >>> s = "Are you afraid of ghosts?"
>>> "ghosts" in s
True

You can also test if a string does not contain a substring:

>>> "coffee" not in s
True

When you have some text more complicated to match, like a phone number, you can use regular expressions.

Methods

There are several method/attribute function in the regular expression module re.
The grammar string itself we call the regular expression or RE.

The regular expression (re) module contains these important methods:

  • match() determines whether the RE matches at the beginning of the string.

  • search() scans the string and finds the location of this RE match

  • findall() finds all substrings that the RE matches and returns them as a list

  • finditer() finds all substrings that the RE matches and returns them as an iterator

The match() function only checks if the RE matches at the beginning of the string, whereas search() scans the entire string.

match() reports only one successful match, which will start at 0; if the match did not start at 0, match() will not report it.

search() will scan the entire string and report the first match it finds.

re.match

An example of a regular expression is:

>>> import re
>>> txt = "Carl is a cat, he is smart, clever, and more.."
>>> m = re.match(r"(\w+)\s", txt)
>>> if m:
... print(m.group(0))
... else:
... print('no match found')
...

The first parameter is a regular expression, here "(\w+)\s", which returns a Match if the match is successful, otherwise a None. You can see a list of grammar rules at the bottom of the page.

The second parameter indicates the string to be matched.

You can use a grammarical string, called a regular expression or regex, to search for a match. In the example below we search the start of the string for a matching pattern:

>>> import re
>>> txt = "The number 123456 is my phone number"
>>> result = re.match(r'^The number \d+\s*',txt)
>>> print(result)
<re.Match object; span=(0, 18), match='The number 123456 '>
>>> print(result.group(0))
The number 123456

Several grammatical strings are possible:

>>> import re
>>> txt = "The number 123456 is my phone number"
>>> result = re.match(r'^The.*?(\d+).*?',txt)
>>> print(result)
<re.Match object; span=(0, 17), match='The number 123456'>
>>> print(result.group(0))
The number 123456
>>> print(result.group(1))
123456
>>>

python regular expression match example

The re.search function looks for pattern matches within the string until the first match is found and then returns None if the string does not match.

>>> import re
>>> txt = "Sombrero in Spain for fun"
>>> obj = re.search('Spain',txt)
>>> print(obj)
<re.Match object; span=(12, 17), match='Spain'>
>>> print(obj.group(0))
Spain
>>>

The prototype function for re.search is: re.search(pattern, string, flags)

Each parameter has the same meaning as re.match.

The difference between re.match and re.search:

re.match matches only the beginning of the string, if the beginning of the string does not match the regular expression, the match fails and the function returns None;

whereas re.search matches the entire string until a match is found.

group method

group() returns the overall matching string of re. You can enter multiple group numbers at once, corresponding to the matching string of group numbers.

  1. group() returns the overall matching string of the re.

  2. group (n,m) Returns the string whose group number is n and m. If the group number does not exist, it returns the indexError exception

>>> import re
>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'
>>>

The groups() tuple contains all the group strings in the regular expression, from 1 to the included group number, and usually groups() does not require an argument, returning a tuple in which the tuple is the group defined in the regular expression.

>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.groups()
('abc', 'b')

re.findall

re.findall returns all matching strings as a list

re.findall can get all the matching strings in the string. E.g., re.findall(r’\woo\w‘, text); get all words in the string that contain ‘oo’. (pattern) match pattern and get this match

>>> import re
>>> txt = "Carl is a cool cat from a good family that and has a happy mood"
>>> re.findall(r'\w*oo\w*',txt)
['cool', 'good', 'mood']

python regular expression example

re.finditer

You can also use the finditer() method. It searches the string from start to end and matches are returned in order.

>>> import re
>>> txt = "Blue blue sky"
>>> pattern = "blue sky"
>>> for match in re.finditer(pattern,txt):
... s = match.start()
... e = match.end()
... print(f'String match {pattern} at {s}:{e}')
...
String match blue sky at 5:13
>>>

Grammar rules

The permitted grammar for regular expressions is:

Rule Description
\d Matches a decimal digit; equivalent to the set [0-9].
\D The complement of \d. It matches any non-digit character; equivalent to the set [^0-9].
\s Matches any whitespace character; equivalent to [ \t\n\r\f\v].
\S The complement of \s. It matches any non-whitespace character; equiv. to [^ \t\n\r\f\v].
\w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_].
\W Matches the complement of \w.
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
\\ Matches a literal backslash.
Note that Regular expression regex grammar can be a combination of all of the above