Posted in Arrays and Strings, Data Structures/ Leetcode, Dynamic Programming

Regular Expression Matching Using Dynamic Programming

This is a follow up on the previous post where we solved Regular Expression Matching using simple recursion. In this post we will solve the same problem using Dynamic Programming. Dynamic Programming (DP) is an algorithmic technique for solving an optimization problem by breaking it down into simpler subproblems where the optimal solution to the problem is obtained by solving for optimal solutions to the subproblems.

Problem Statement

Given an input string and a regex pattern, implement regular expression matching with support for ‘.’ and ‘*’.

  • ‘.’ matches any single character
  • ‘*’ matches zero or more of the preceding element (the char before *)

Examples

  1. s = ‘aa’, p = ‘ab’ Not a match
  2. s= ‘bbbbb’, p = ‘b*’ Match because ‘*’ matches zero or more of the preceding element b
  3. s= ‘fljflf’, p = ‘.*’ Match because we want to match zero or more of the preceding element which is any character in this case
  4. s = ‘oop’, p = ‘r*o*p’ Match because r is present 0 times, o is present twice and p is a direct match

Thought Process

In any dynamic programming problem we should first define the subproblem. Let’s consider an example where s = ‘bbbbb’ and p = ‘b*b*c*’. If we use the recursive solution to solve the problem, the recursion tree will branch out as shown below. Although not all the branches are represented in the diagram, we can clearly see that some of the subproblems are repeatedly solved over and over again and this is an expensive computation. One way to avoid solving repeated problems is to remember the results of a computation using a technique called Memoization.

Each subproblem call will resemble isRegexMatch(s[i:], pattern[j:]) where i and j are indexes used to slice the string and the pattern respectively. So we do not have to memoize the entire substring and substring of the pattern. So instead of

mem[(s[i:], pattern[j:])) = result

We will use only i and j to index the result in mem such as

mem[(i, j)] = result

We will therefore use the exact same solution from our previous solution except this time we will store the results for future lookup.

Complexity

For a string of length S and pattern of length P, we call dp(i, j) at the worst case for all the possible subproblems for i = 0 to S and j = 0 to P. So the time complexity will be O(SP).

Same for space complexity since we are caching the results of the subproblems O(SP).

Solution

def isRegexMatch(s, pattern):

    mem = {}
        
    def dp(i, j):
        if (i, j) not in mem:
            if j == len(pattern): # Check if pattern is empty, if so string s should also be empty
                mem[(i,j)] = i == len(s)
            else:
                # Check the first character match in s and pattern
                # String should not be empty
                # The first characters exactly match (Or)
                # pattern's first char contains "." to match any char in s
                is_first_char_match = len(s) > i and (s[i] == pattern[j] or pattern[j] == ".")
                
                # When '*' is present in the second char of pattern    
                if j < len(pattern) - 1 and pattern[j+1] == "*":
                    mem[(i,j)] = dp(i, j+2) or (is_first_char_match and dp(i+1, j))
                else: 
                    # Continue checking the remaining characters using recursion
                    mem[(i,j)] = is_first_char_match and dp(i+1, j+1) 
                
        return mem[(i,j)]
    return dp(0, 0)

Hope you enjoyed solving this dynamic programming example using Memoization technique. See you in the next post!

Posted in Data Structures/ Leetcode, Dynamic Programming, June Leetcoding Challenge

Coin Change – Dynamic Programming

We are given an infinite supply of coins of ‘n’ different denominations and a total amount T. We have to find the total number of distinct ways in which we can make up this amount.

Input: amount = 5, coins = [1, 2, 5]
Output: 4
Explanation: There are four ways to make up the amount 5.
5=5
5=2+2+1
5=2+1+1+1
5=1+1+1+1+1

Infinite supply of coins means that we can choose the same coin multiple times in order to make up the total amount 5.

Solution #1: Let’s start with a brute force approach which is to try all the possible combinations from the given coins. Each combination is a subset of coins. For each coin, we can make two different decisions.

  • If the coin amount does not exceed the total give amount, we can INCLUDE THE COIN in our subset combination and then recursively make a decision on the remaining coins including the current coin, since we can repeatedly choose the same coin to get T.
  • Another option is to EXCLUDE THE COIN and create a new subset combination with the remaining coins.

The above points make up the core algorithm of this problem. Let’s say the total amount is T and C is the current denomination of the coin that we are processing. Then,

  • If we include the first coin C, the sum left to make up to T will be T – C and we process the list coins[0…n-1] since we are allowed to select the same coin again.
  • If we exclude the first coin, the sum left to make up to T will remain the same and we will process the list coins[1…n-1].

This gives way to a recursion tree as shown below.

Recursion tree

Time Complexity: This algorithm has exponential time complexity O(2C+T) where C is the total coins we are given and T is the total amount. This is because for each coin c and amount t, we have to make two decisions ie., branch into two recursive sub-problems of including and excluding the coin.

Space Complexity: O(2C+T) since the recursion will use stack space that is as deep as the recursion tree.

Code for solution #1:

def change(self, amount: int, coins: List[int]) -> int:
    
    def countHelper(coins, totalAmount, coinIndex):
        
        # If the amount is 0, there is only one way to make this
        # Just exclude the coin
        if totalAmount == 0:
            return 1
        
        n = len(coins)
        # No coins left or all coins have been processed
        if n == 0 or currentIndex >= n:
            return 0
        
        includeCoinSum = 0
        if coins[coinIndex] <= totalAmount:
            includeSum = countHelper(coins, totalAmount - coins[coinIndex], coinIndex)
            
        excludeCoinSum = countHelper(coins, totalAmount, coinIndex + 1)
        
        return includeCoinSum + excludeCoinSum
    
    return countHelper(coins, amount, 0)

Solution #2: It’s pretty clear that an exponential algorithm is not efficient. This is when dynamic programming comes into action. Why is this problem a good candidate for dynamic programming?

Overlapping sub-problems: As we branch down the tree, we saw in the recursion tree image that we compute the total number of ways for some coin c and amount t repeatedly. If the tree size grows, the repetitions will be even higher. This means we can cache the results of a sub-problem and then re-use it the next time we want to compute the same. This is called memoization technique.

Optimal sub-structure: In any given problem, if we are able to compute the optimal solution for a larger problem by computing the optimal results for its smaller sub-problems, then it follows the optimal sub-structure property.

How can we choose what to memoize? To figure this part out, we look at the two variables that are constantly changing state in solution #1 code. This is the coinIndex and the totalAmount values which keep changing during every recursive call. So we store the count value (number of ways) to make an amount totalAmount with the coin at coinIndex during each recursive call. Our memoization array is a 2D array dp[len(coins)][totalAmount + 1] and the number of sub-problems that we need to solve will be C * T ie., for each grid in the 2D array.

Time Complexity: O(C * T) to compute all the sub-problems.

Space Complexity: O(C * T) for the dp array and the recursion costs another O(C) stack space, which adds up to O(C * T + C). So the overall space complexity will be O(C * T).

Code for solution #2:

def change(self, amount: int, coins: List[int]) -> int:
    
    def countHelper(dp, coins, totalAmount, coinIndex):
        
        # If the amount is 0, there is only one way to make this
        # Just exclude the coin
        if totalAmount == 0:
            return 1
        
        n = len(coins)
        # No coins left or all coins have been processed
        if n == 0 or currentIndex >= n:
            return 0
        
        if dp[coinIndex][totalAmount] != -1:
            return dp[coinIndex][totalAmount]
        
        includeCoinSum = 0
        if coins[coinIndex] <= totalAmount:
            includeSum = countHelper(dp, coins, totalAmount - coins[coinIndex], coinIndex)
            
        excludeCoinSum = countHelper(dp, coins, totalAmount, coinIndex + 1)
        
        dp[coinIndex][totalAmount] = includeCoinSum + excludeCoinSum
        return dp[coinIndex][totalAmount]
    
    dp = [[-1 for _ in range(amount+1)] for _ in range(len(coins))]
    
    return countHelper(dp, coins, amount, 0)

The above approach is called top down dynamic programming. This is because we start with the larger problem and recursively break it down into smaller sub-problems and compute their results. We can avoid recursion by approaching this problem bottom-up. In this case, we compute the results of sub-problems before using them to compute the larger problem. This method is called tabulation.

Solution #3:

We can use the same 2D array dp[len(coins)][totalAmount + 1] to populate the sub-problem results but this time we start from the smallest sub-problem. For this to make sense, we need to understand what a single sub-problem in this array means. dp[c][t] means the total number of ways to make up an amount t by including or excluding the coin at index c.

  • When we exclude a coin at index c, we can make a total t using dp[c – 1][t] number of ways. We use the coins up until the index c – 1 to make the amount t but we skip the coin at c.
  • When we include the coin at index c, we have to subtract its value from the total amount t ie., dp[c][t – coins[c]]

Adding the above two results gives the result for dp[c][t] ie. the total number of ways to make t using the coins up until index c.

dp[c][t] = dp[c – 1][t] + dp[c][t – coins[c]]

The dp array contains the following values for the given problem. Our answer is in the bottom right grid of the dp array when we compute dp[C][T] using all the sub-problems.

Tabulation

Time Complexity: O(C * T) to populate the dp array.

Space Complexity: O(C * T) for the 2D array.

Code for the above solution:

def change(self, amount: int, coins: List[int]) -> int:

    if not amount:
        return 1

    if not coins:
        return 0

    n = len(coins)
    dp = [[0 for i in range(amount+1)] for _ in range(n)]

    for i in range(n):
        dp[i][0] = 1

    for c in range(n):
        for t in range(1, amount+1):
            dp[c][t] = dp[c-1][t]

            if coins[c] <= t:
                dp[c][t] += dp[c][t - coins[c]]

    return dp[n-1][amount]

Hope you enjoyed learning how to solve the coin change problem using dynamic programming. Have a great day!