Given an input string and a regex pattern, implement regular expression matching with support for ‘.’ and ‘*’.
- ‘.’ matches any single character
- ‘*’ matches zero or more of the preceding element (the char before *)
Examples
- s = ‘aa’, p = ‘ab’ Not a match
- s= ‘bbbbb’, p = ‘b*’ Match because ‘*’ matches zero or more of the preceding element b
- s= ‘fljflf’, p = ‘.*’ Match because we want to match zero or more of the preceding element which is any character in this case
- s = ‘oop’, p = ‘r*o*p’ Match because r is present 0 times, o is present twice and p is a direct match
Thought Process
Let’s approach this problem step by step.
Case 1: The simplest case would be when the given string and the pattern do not contain any ‘.’ or ‘*’ characters but only alphabets.
s = 'aa' and p = 'ab'
In this case we would check the string and the pattern from left to right to see if each character in the string matches the characters in the pattern.
a = a matched so far
b != a Match failed so return False
Case 2: When no ‘*’ characters are present but only ‘.’. In this case ‘.’ can match any single character. So we check for that condition and continue to check the remaining characters in the given string and the pattern using recursion.
def isRegexMatch(s, pattern):
# If pattern is empty, as long as string is also empty => true
# If pattern is empty but string is not empty => return false
if not pattern:
return not s
# Check the first character match in s and pattern
# String should not be empty
# The first characters exactly match (Or)
# pattern's first char contains "." to match any char in s
is_first_char_match = len(s) > 0 and (s[0] == pattern[0] or pattern[0] == ".")
# Continue checking the remaining characters using recursion
return is_first_char_match and isRegexMatch(s[1:], pattern[1:])
Case 3: When the regex pattern contains both ‘.’ and ‘*’ characters, things get a little complicated.
Note that the pattern will never begin with ‘*’ because it denotes that a preceding character can be present 0 or more times. So it will ALWAYS be present only as the second character when that part of the pattern is being processed after a recursive call, For example: a*, .*
So we can have a condition that evaluates this logic as shown below.
if len(pattern) >= 2 and pattern[1] == "*"
If the above condition holds true, then we check regex match differently due to the presence of ‘*’. Let’s think about how we can do that. There are two valid scenarios that are allowed here.
Scenario 1: s = "aaaa", pattern = "c*a*"
In this case len(pattern) = 4 which is >= 2 and pattern[1] = ‘*‘. We see that c is not present in the string s but it can occur zero 0 or more times. In this case we can ignore checking the first two characters of the pattern and continue the recursion as shown below.
isRegexMatch(s, pattern[2:]) => isRegexMatch("aaaa", "a*")
Scenario 2: s = "aaaa", pattern = "a*"
In this case len(pattern) = 2 which is >= 2 and pattern[1] = ‘*‘. s contains ‘a’ which is a direct match with the first character in the pattern. So we check for first character match and continue recursion on the rest of the string.
is_first_char_match and isRegexMatch(s[1:], pattern) => a = a and isRegexMatch("aaa", "a*")
Note that we pass the full pattern string to the recursion call here because ‘*’ can match zero or more characters of ‘a’. So we want to check the string for that as well. The following recursive calls will hold true.
isRegexMatch("aaa", "a*") =>
is_first_char_match and isRegexMatch(s[1:], pattern) => a = a and isRegexMatch("aa", "a*")
isRegexMatch("aa", "a*") =>
is_first_char_match and isRegexMatch(s[1:], pattern) => a = a and isRegexMatch("a", "a*")
isRegexMatch("a", "a*") =>
is_first_char_match and isRegexMatch(s[1:], pattern) => a = a and isRegexMatch("", "a*")
isRegexMatch("", "a*") =>
isRegexMatch(s, pattern[2:]) => isRegexMatch("", "") so final result will be true.
if len(pattern) >= 2 and pattern[1] == "*":
return isRegexMatch(s, pattern[2:]) or (is_first_char_match and isRegexMatch(s[1:], pattern))
Complexity
For a given string of length S and pattern P, we make recursive calls using indexes s[i:] and pattern[2j:] (When ‘*’ is present). The recursive tree can go to a depth of (S+P/2) which will have 2h leaf nodes where h = (S+P/2), so 2(S+P/2) leaf nodes. This means the recursion will branch into 2(S+P/2) sub-problems.
If we pass the entire string and pattern at the worst case, the complexity of solving a sub-problem will be O(S+P). So the time complexity will roughly be
O((S+P) * 2(S+P/2))
To read more about the complexity calculation for this problem, here’s a detailed article.
Putting it all together.
Solution
def isRegexMatch(s, pattern):
# If pattern is empty, as long as string is also empty => true
# If pattern is empty but string is not empty => return false
if not pattern:
return not s
# Check the first character match in s and pattern
# String should not be empty
# The first characters exactly match (Or)
# pattern's first char contains "." to match any char in s
is_first_char_match = len(s) > 0 and (s[0] == pattern[0] or pattern[0] == ".")
# When '*' is present in the second char of pattern
if len(pattern) >= 2 and pattern[1] == "*":
return isRegexMatch(s, pattern[2:]) or (is_first_char_match and isRegexMatch(s[1:], pattern))
else:
# Continue checking the remaining characters using recursion
return is_first_char_match and isRegexMatch(s[1:], pattern[1:])
Hope you enjoyed solving the regular expression problem. In the next post, we will look at how to improve upon this solution using Dynamic Programming!
