Regular Expressions -Quantifiers Part 2

This is the continuation of Regular Expressions – Quantifiers in which we will be discussing some of the new concepts in regular expressions i.e., grouping, handling of special characters, pipe operator, regex flags, and compile function.

Firstly, we will discuss the grouping i.e., parenthesis. Till now, in the regular expression pattern, we have used quantifiers preceded by a single character. If we wrap characters under the parentheses, the quantifier will look for repetition of the group of characters under the parenthesis. This concept is known as a grouping in the regular expression. For example, the pattern ‘(xyz){1, 2}’ will match the following strings:

  • xyz
  • xyzxyz

Similarly, the pattern (abc)+ will match:

  • abc
  • abcabc
  • abcabcabc, and so on.

Exercise 1: Grouping

Description

Write a regular expression that matches a string where ’45’ occurs one or more times followed by the occurrence of ’37’ one or more times

Sample positive matches (should match all of these):
4537
45453737
454545453737

Sample negative matches (shouldn’t match either of these):
45
37
45537
44537
44553377

Regular_Expression_Basics1

Now, let’s move to our next notation i.e., pipe operator represented by ‘|’ in the regular expression.

The pipe operator is to represent OR operation. We need to use the pipe operator inside the parentheses.

For example, the pattern ‘(w|t)ear’ will match both the strings – ‘wear’ and ‘tear’.

Similarly, the pattern ‘(SBI|Citi) Bank’ will match both the strings ‘SBI Bank’ and ‘Citi Bank’.

In addition, we can also use other quantifiers followed by parentheses with pipe operators inside them. Further, there can be multiple pipe operators inside the parentheses possible.

The pattern ‘(a|b|c){2} means ‘exactly two occurrences of either of a, b or c’s, and it will match to these strings – ‘aa’, ‘ab’, ‘ac’, ‘ba’, ‘bb’, ‘bc’, ‘ca’, ‘cb’ and ‘cc’.

Exercise 2: Pipe operator

Description

Write a regular expression that matches the following strings: 

  • Basketball 
  • Baseball 
  • Volleyball 
  • Softball 
  • Football
Regular_Expression_Basics1

Next, we will move to handle special characters. As we will often find ourselves in a situation where we will need to mention special characters such as ‘?’, ‘*’, ‘+’, ‘(‘, ‘)’, ‘{‘, etc. in our regular expressions. 

In such a situation, we’ll need to use escape sequences. The escape sequence, denoted by a backslash ‘\’, is used to escape the special meaning of the special characters.

It is to be noted that ‘\’ itself is a special character and to match ‘\’ character, we need to escape it too. So, we can use the pattern ‘\\’ to escape the backslash.

Exercise 3: Special character

Description

Write a regular expression that returns True when passed a multiplication equation. For any other equation, it should return False. In other words, it should return True if there is an asterisk – ‘*’ – present in the equation.

Sample positive cases (should match all of these):
3a*4b
3*2
4*5*6=120

Sample negative cases (shouldn’t match either of these):
5+3=8 
3%2=1

Regular_Expression_Basics1

As we can see from the above code, the first three multiplication equations are matched correctly whereas the other two equations are not matched.

Next, there is something called the regex flag. For example, if we want our regex to ignore the case of the text then we can pass the ‘re.I’ flag in the flags parameter under re.search function.

Similarly, if we have input text with multiple lines in that case we can pass a flag with the syntax re.M. The syntax to pass multiple flags under re.search function is:

re.search(pattern, string, flags=re.I | re.M)

Lastly, we will discuss compile function in the regular expression. It stores the pattern of regular expression in the cache memory and returns search results much faster.

For this, we have to pass the regex pattern inside re.compile() function. The following code snippet shows the difference between searching with and without the compile function.

# without re.compile() function
result = re.search("10+", "100")

# using the re.compile() function
pattern = re.compile("10+")
result = pattern.search("100")

So this is the end of quantifiers in the regular expression. In the next section, we will learn about anchors and wildcards.

Proceed to Regular Expression: Anchors and Wildcards

Leave a Comment