In this lesson on regular expression, we will learn about the importance of anchors and wildcard characters.
Firstly, we start with anchor characters which are used to represent the start and end of the string. There are two anchor characters ‘^’ and ‘$’. The first anchor character i.e., ‘^’ specifies the start of the string. The character after ‘^’ in the pattern must be the first character of the string in order to match the pattern.
Contrary to the above scenario, ‘$’ specifies the end of the string i.e., the character preceding ‘$’ in the pattern must be the last character of the string in order to match the pattern.
In addition, we can specify both the anchors in a single regular expression itself. For example, the regular expression pattern ‘^ab*a$’ will match any string that starts and end with a’s with any number of b’s between them. It will match ‘aba’, ‘abba’, ‘abbbbbbba’, and even ‘aa’ (as ‘*’ matches zero or more b’s). But it will not match the string ‘a’ because there is only one ‘a’ in this string and in the pattern we have specified that there must be two a’s, one at the start and one at the end.
Exercise 1: Anchors
Description
Write a pattern that matches all the dictionary words that start with ‘D’
Positive matches (should match all of these):
Delight
diligence
Danger
Negative match (shouldn’t match any of these):
Bribe
10
Zenith
# import regular expression library
import re
# input string on which to test regex pattern
sample_sent = ['Delight','Diligence','Danger', 'Bribe', '10','Zenith']
pattern = '^D'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
# check whether pattern is present in string or not
result = re.search(pattern,sent,re.I) # pass the arguments to the re.search function
if result != None:
print(True)
else:
print(False)
True True True False False False
As we can observe from the above results that the pattern with the anchor tag matched all the words starting with the letter ‘D’. Further, we have also mentioned the flag parameter as re.I for ignoring the case.
Exercise 2: Anchors
Description
Write a pattern that matches a word that ends with ‘ing’. Words such as ‘dancing’, ‘beating’, ‘raining’, etc. should match while words that don’t have ‘ing’ at the end shouldn’t match.
# import regular expression library
import re
# input string on which to test regex pattern
sample_sent = ['dancing','playing','beating', 'raining', 'build','haze']
pattern = 'ing$'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
# check whether pattern is present in string or not
result = re.search(pattern,sent,re.I) # pass the arguments to the re.search function
if result != None:
print(True)
else:
print(False)
True True True True False False
From the above result, we can see that the pattern with anchor tag $ matched all the words ending with ‘ing’
Exercise 3: Anchors
Description
Write a regular expression that matches any string that starts with one or more ‘a’s, followed by three or more ‘b’s, followed by any number of ‘a’s (zero or more), followed by ‘a’s (from one to seven), and then ends with either two or three ‘b’s.
# import regular expression library
import re
# input string on which to test regex pattern
sample_sent = ['aabbbbaabbbaaa']
pattern = '^a+b{3,}a*b{1,7}a{2,3}$'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
# check whether pattern is present in string or not
result = re.search(pattern,sent,re.I) # pass the arguments to the re.search function
if result != None:
print(True)
else:
print(False)
True
As we can see from the above code, for this complex pattern we have used the combination of both the anchor tags along with other quantifiers as we have the condition for both starting and ending positions.
Next, we will learn one special character in regular expressions i.e., the ‘.’ (dot) character also known as the wildcard character. The wildcard character basically acts as a placeholder and can match any possible character in the input string.
Exercise 4: Wildcard
Description
Write a regular expression to match first names (consider only first names, i.e. there are no spaces in a name) that have lengths of between five and ten characters.
Sample positive match:
Amandeep
Krishna
Sample negative match:
Ram
# import regular expression library
import re
# input string on which to test regex pattern
sample_sent = ['Anandeep','Krishna','Ram']
pattern = '.{5,10}'
# iterating list of sample sentences to print match status as true or false
for sent in sample_sent:
# check whether pattern is present in string or not
result = re.search(pattern,sent) # pass the arguments to the re.search function
if result != None:
print(True)
else:
print(False)
True True False
So, now we have learned how the ‘.’ character can act as a placeholder for matching any character. In the next lesson, we will study character sets.