10. Regular Expression Matching

problem description

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example 1:

Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Example 4:

Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore, it matches "aab".

Example 5:

Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

algorithm thought

正则表达式问题，两个字符串匹配。碰到字符串问题，一般都是动态规划问题，或者是字符串匹配的一些专门的算法，比如kmp这种。还有就是很多字符串，如果想要找到前缀重合之类的问题，就是trie字典树。正则表达式问题就是典型的动态规划解决的题目。使用一个二维数组保存每一个状态，横纵坐标ij表示s[0-i]和p[0-j]的匹配结果。每一个状态转移的时候，一定要分析全要考虑的前一个状态。这一题中就*号很特殊，他可以是一个也可以是零个也可以是多个。分析到*号的时候，需要考虑的前一个状态有点多。

分析一次example4的二维数组是如何构造的

   '' c  *  a  *  b
'' 1  0  1  0  1  0
a  0  0  0  1  1  0
a  0  0  0  0  1  0
b  0  0  0  0  0  1

首先考虑一般情况，如果两个字符相等了，只需要查看左上角的bool值即可，如果匹配当前也是匹配的。如果是一个点，那也好说，直接按照相等来匹配。关键就是如何匹配*。我们来回顾一下*的定义，首先可以是0个，0个的话，直接查看靠左两个方格的bool值即可，比如c*和空字符比较，也就是第一排前三个。两个空字符肯定是匹配的，值为1，c和空字符肯定不匹配，格子值为0，c*是和空字符匹配的，因为可以是0个c，所以，直接观察左边两个格子的值即可。然后考虑1个的情况，现在可以换一种思想，如果当前行列匹配需要1个字符，那么上一行和当前列匹配肯定是只需要0个字符，如果上一行成功匹配，那么这一行肯定也成功。同理可得1个或者多个匹配的话，只需要看上一个格子是否满足条件即可。

可能讲述的还是不清晰，建议先做简单的动态规划解法字符串匹配问题。比如最长公共子序列等问题，再来做这题会清晰很多。

code

class Solution {
public:
    bool isMatch(string s, string p) {
        vector<vector<bool>> dp(s.size()+1,vector<bool>(p.size()+1,false));
        dp[0][0]=true;
        
        for(int i=1;i<=p.size();++i)
            if(p[i-1]=='*')
                dp[0][i]=dp[0][i-2];
        
        for(int i=1;i<=s.size();++i)
            for(int j=1;j<=p.size();++j){
                if(s[i-1]==p[j-1]||p[j-1]=='.')
                    dp[i][j]=dp[i-1][j-1];
                if(p[j-1]=='*'){
                    if((s[i-1]!=p[j-2])&&p[j-2]!='.')
                        dp[i][j]=dp[i][j-2];
                    else
                        dp[i][j]=(dp[i-1][j]||dp[i][j-2]);
                }
            }
        return dp[s.size()][p.size()];
    }
};

algorithm analysis

定义了一个二维数组，如果s的长度为m，p的长度为n。二维数组就是一个m*n的矩形。算法需要遍历这个二维数组，循环中时间复杂度为O(1)，最后时间复杂度是O(mn)

Previous9. Palindrome Number Next11. Container With Most Water

Last updated 5 years ago