This is the continuation of Regular Expressions – Quantifiers in which we will be discussing some of the new concepts in regular expressions i.e., grouping, handling of special characters, pipe operator, regex flags, and compile function.
Firstly, we will discuss the grouping i.e., parenthesis. Till now, in the regular expression pattern, we have used quantifiers preceded by a single character. If we wrap characters under the parentheses, the quantifier will look for repetition of the group of characters under the parenthesis. This concept is known as a grouping in the regular expression. For example, the pattern ‘(xyz){1, 2}’ will match the following strings:
- xyz
- xyzxyz
Similarly, the pattern (abc)+ will match:
- abc
- abcabc
- abcabcabc, and so on.
Exercise 1: Grouping
Description
Write a regular expression that matches a string where ’45’ occurs one or more times followed by the occurrence of ’37’ one or more times
Sample positive matches (should match all of these):
4537
45453737
454545453737
Sample negative matches (shouldn’t match either of these):
45
37
45537
44537
44553377
# import regular expression library
import re
# input string on which to test regex pattern
sample_sent = ['4537', '45453737', '454545453737','45','37','45537','445537','44553377']
pattern = '(45)+(37)+'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
# check whether pattern is present in string or not
result = re.search(pattern,sent) # pass the arguments to the re.search function
if result != None:
print(True)
else:
print(False)
True True True False False False False False
Now, let’s move to our next notation i.e., pipe operator represented by ‘|’ in the regular expression.
The pipe operator is to represent OR operation. We need to use the pipe operator inside the parentheses.
For example, the pattern ‘(w|t)ear’ will match both the strings – ‘wear’ and ‘tear’.
Similarly, the pattern ‘(SBI|Citi) Bank’ will match both the strings ‘SBI Bank’ and ‘Citi Bank’.
In addition, we can also use other quantifiers followed by parentheses with pipe operators inside them. Further, there can be multiple pipe operators inside the parentheses possible.
The pattern ‘(a|b|c){2} means ‘exactly two occurrences of either of a, b or c’s, and it will match to these strings – ‘aa’, ‘ab’, ‘ac’, ‘ba’, ‘bb’, ‘bc’, ‘ca’, ‘cb’ and ‘cc’.
Exercise 2: Pipe operator
Description
Write a regular expression that matches the following strings:
- Basketball
- Baseball
- Volleyball
- Softball
- Football
# import regular expression library
import re
# input string on which to test regex pattern
sample_sent = ['Basketball', 'Baseball', 'Volleyball','Softball','Football']
pattern = '(Basket|Base|Volley|Soft|Foot)ball'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
# check whether pattern is present in string or not
result = re.search(pattern,sent) # pass the arguments to the re.search function
if result != None:
print(True)
else:
print(False)
True True True True True
Next, we will move to handle special characters. As we will often find ourselves in a situation where we will need to mention special characters such as ‘?’, ‘*’, ‘+’, ‘(‘, ‘)’, ‘{‘, etc. in our regular expressions.
In such a situation, we’ll need to use escape sequences. The escape sequence, denoted by a backslash ‘\’, is used to escape the special meaning of the special characters.
It is to be noted that ‘\’ itself is a special character and to match ‘\’ character, we need to escape it too. So, we can use the pattern ‘\\’ to escape the backslash.
Exercise 3: Special character
Description
Write a regular expression that returns True when passed a multiplication equation. For any other equation, it should return False. In other words, it should return True if there is an asterisk – ‘*’ – present in the equation.
Sample positive cases (should match all of these):
3a*4b
3*2
4*5*6=120
Sample negative cases (shouldn’t match either of these):
5+3=8
3%2=1
# import regular expression library
import re
# input string on which to test regex pattern
sample_sent = ['3a*4b', '3*2', '4*5*6=120','5+3=8','3%2=1']
pattern = ('\*')
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
# check whether pattern is present in string or not
result = re.search(pattern,sent) # pass the arguments to the re.search function
if result != None:
print(True)
else:
print(False)
True True True False False
As we can see from the above code, the first three multiplication equations are matched correctly whereas the other two equations are not matched.
Next, there is something called the regex flag. For example, if we want our regex to ignore the case of the text then we can pass the ‘re.I’ flag in the flags parameter under re.search function.
Similarly, if we have input text with multiple lines in that case we can pass a flag with the syntax re.M. The syntax to pass multiple flags under re.search function is:
re.search(pattern, string, flags=re.I | re.M)
Lastly, we will discuss compile function in the regular expression. It stores the pattern of regular expression in the cache memory and returns search results much faster.
For this, we have to pass the regex pattern inside re.compile() function. The following code snippet shows the difference between searching with and without the compile function.
# without re.compile() function
result = re.search("10+", "100")
# using the re.compile() function
pattern = re.compile("10+")
result = pattern.search("100")
So this is the end of quantifiers in the regular expression. In the next section, we will learn about anchors and wildcards.