In this lesson on regular expression, we will learn about the importance of anchors and wildcard characters.

Firstly, we start with anchor characters which are used to represent the start and end of the string. There are two anchor characters ‘^’ and ‘$’. The first anchor character i.e., ‘^’ specifies the start of the string. The character after ‘^’ in the pattern must be the first character of the string in order to match the pattern.

Contrary to the above scenario, ‘$’ specifies the end of the string i.e., the character preceding ‘$’ in the pattern must be the last character of the string in order to match the pattern.

In addition, we can specify both the anchors in a single regular expression itself. For example, the regular expression pattern ‘^ab*a$’ will match any string that starts and end with a’s with any number of b’s between them. It will match ‘aba’, ‘abba’, ‘abbbbbbba’, and even ‘aa’ (as ‘*’ matches zero or more b’s). But it will not match the string ‘a’ because there is only one ‘a’ in this string and in the pattern we have specified that there must be two a’s, one at the start and one at the end.

Exercise 1: Anchors

Description

Write a pattern that matches all the dictionary words that start with ‘D’

Positive matches (should match all of these):
Delight
diligence
Danger

Negative match (shouldn’t match any of these):
Bribe
10
Zenith

Regular_Expression_Basics1

In [1]:

# import regular expression library
import re

In [30]:

# input string on which to test regex pattern
sample_sent = ['Delight','Diligence','Danger', 'Bribe', '10','Zenith']
pattern = '^D'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
    # check whether pattern is present in string or not
    result = re.search(pattern,sent,re.I)  # pass the arguments to the re.search function
    if result != None:
        print(True)
    else:
        print(False)

True
True
True
False
False
False

As we can observe from the above results that the pattern with the anchor tag matched all the words starting with the letter ‘D’. Further, we have also mentioned the flag parameter as re.I for ignoring the case.

Exercise 2: Anchors

Description

Write a pattern that matches a word that ends with ‘ing’. Words such as ‘dancing’, ‘beating’, ‘raining’, etc. should match while words that don’t have ‘ing’ at the end shouldn’t match.

Regular_Expression_Basics1

In [1]:

# import regular expression library
import re

In [31]:

# input string on which to test regex pattern
sample_sent = ['dancing','playing','beating', 'raining', 'build','haze']
pattern = 'ing$'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
    # check whether pattern is present in string or not
    result = re.search(pattern,sent,re.I)  # pass the arguments to the re.search function
    if result != None:
        print(True)
    else:
        print(False)

True
True
True
True
False
False

From the above result, we can see that the pattern with anchor tag $ matched all the words ending with ‘ing’

Exercise 3: Anchors

Description

Write a regular expression that matches any string that starts with one or more ‘a’s, followed by three or more ‘b’s, followed by any number of ‘a’s (zero or more), followed by ‘a’s (from one to seven), and then ends with either two or three ‘b’s.

Regular_Expression_Basics1

In [1]:

# import regular expression library
import re

In [32]:

# input string on which to test regex pattern
sample_sent = ['aabbbbaabbbaaa']
pattern = '^a+b{3,}a*b{1,7}a{2,3}$'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
    # check whether pattern is present in string or not
    result = re.search(pattern,sent,re.I)  # pass the arguments to the re.search function
    if result != None:
        print(True)
    else:
        print(False)

True

As we can see from the above code, for this complex pattern we have used the combination of both the anchor tags along with other quantifiers as we have the condition for both starting and ending positions.

Next, we will learn one special character in regular expressions i.e., the ‘.’ (dot) character also known as the wildcard character. The wildcard character basically acts as a placeholder and can match any possible character in the input string.

Exercise 4: Wildcard

Description

Write a regular expression to match first names (consider only first names, i.e. there are no spaces in a name) that have lengths of between five and ten characters.

Sample positive match:
Amandeep
Krishna

Sample negative match:
Ram

Regular_Expression_Basics1

In [39]:

# import regular expression library
import re

In [48]:

# input string on which to test regex pattern
sample_sent = ['Anandeep','Krishna','Ram']
pattern = '.{5,10}'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
    # check whether pattern is present in string or not
    result = re.search(pattern,sent)  # pass the arguments to the re.search function
    if result != None:
        print(True)
    else:
        print(False)

True
True
False

So, now we have learned how the ‘.’ character can act as a placeholder for matching any character. In the next lesson, we will study character sets.

Proceed to Regular Expression: Character sets

Regular Expressions – Anchors & Wildcards

Exercise 1: Anchors

Exercise 2: Anchors

Exercise 3: Anchors

Exercise 4: Wildcard

Leave a Comment Cancel reply