LeetCode 3455 - Shortest Matching Substring

The problem asks us to find the length of the shortest substring in a given string s that matches a pattern string p, where the pattern contains exactly two wildcard '' characters.

LeetCode Problem 3455

Difficulty: 🔴 Hard
Topics: Two Pointers, String, Binary Search, String Matching

Solution

Problem Understanding

The problem asks us to find the length of the shortest substring in a given string s that matches a pattern string p, where the pattern contains exactly two wildcard '*' characters. Each '*' matches any sequence of zero or more characters, which means it can represent nothing, a single character, or a long substring.

The input consists of two strings: s of length up to 100,000 and p of length up to 100,000. The output is a single integer representing the minimum length of a substring of s that satisfies the pattern p, or -1 if no such substring exists. Notably, the empty substring is considered valid if it satisfies the pattern (for example, if the pattern is "**").

Key insights from the constraints:

  • s and p can be large, so any algorithm that checks all substrings of s is too slow.
  • p always contains exactly two '*', which allows us to split it into three fixed parts: a prefix before the first '*', a middle part between the '*', and a suffix after the second '*'.
  • We must handle the fact that '*' can match zero characters, meaning the fixed parts can touch or even overlap.

Important edge cases include:

  • Patterns consisting solely of wildcards ("**"), which should match the empty substring.
  • Cases where the fixed parts of the pattern are longer than s or cannot be aligned in any order.
  • Cases where the fixed parts appear multiple times and the algorithm must select the shortest substring.

Approaches

Brute Force

The naive approach would consider every substring of s and check if it matches the pattern p by simulating the wildcard matching. This is correct because it exhaustively explores all possibilities, but it is far too slow: there are O(n²) substrings in s, and checking each substring could take O(m) time, leading to a total complexity of O(n² * m), which is unacceptable for n, m up to 10^5.

Optimal Approach

The key observation is that the pattern has exactly two wildcards, which splits it into three fixed segments. Let these segments be left, middle, and right. To find the shortest matching substring in s, we can:

  1. Precompute the earliest positions in s where left and middle appear in order.
  2. Precompute the latest positions in s where middle and right appear in reverse order.
  3. For each possible alignment of these segments, calculate the substring length covering them.
  4. Track the minimum length.

This approach is efficient because it only requires linear scans of s to find potential matches of each segment, avoiding the need to examine every substring explicitly.

Approach Time Complexity Space Complexity Notes
Brute Force O(n² * m) O(1) Check all substrings and match against the pattern
Optimal O(n * m) O(n) Use two pointers and precompute forward/backward match positions

Algorithm Walkthrough

  1. Split the pattern: Identify the positions of the two '*' characters and split p into left, middle, right.
  2. Forward matching: Scan s from left to right. For each position, check if left matches starting there. If it does, scan forward to see where middle matches. Record the earliest positions where these segments can align.
  3. Backward matching: Scan s from right to left. For each position, check if right matches ending there. If it does, scan backward to see where middle matches. Record the latest positions where these segments can align.
  4. Combine results: For every valid forward and backward alignment of left-middle-right, calculate the substring length from the start of left to the end of right.
  5. Return minimum: Keep track of the minimum length found. If no valid alignment exists, return -1. If the pattern is "**", return 0.

Why it works: This algorithm guarantees that every possible alignment of left, middle, and right in s is considered without explicitly iterating over all substrings. By storing earliest and latest positions, we ensure we find the shortest substring that satisfies the pattern.

Python Solution

class Solution:
    def shortestMatchingSubstring(self, s: str, p: str) -> int:
        if p == "**":
            return 0

        stars = [i for i, c in enumerate(p) if c == '*']
        left, middle, right = p[:stars[0]], p[stars[0]+1:stars[1]], p[stars[1]+1:]

        n = len(s)
        # Forward positions
        left_pos = []
        i = 0
        while i <= n - len(left):
            if s[i:i+len(left)] == left:
                left_pos.append(i)
            i += 1

        # Backward positions
        right_pos = []
        i = n - len(right)
        while i >= 0:
            if s[i:i+len(right)] == right:
                right_pos.append(i)
            i -= 1
        right_pos.reverse()

        min_len = float('inf')
        for start in left_pos:
            mid_start = start + len(left)
            # Find earliest right position that comes after start + len(left) + len(middle)
            for end in right_pos:
                if end >= mid_start:
                    total_len = end + len(right) - start
                    min_len = min(min_len, total_len)
                    break

        return min_len if min_len != float('inf') else -1

Explanation: We first check for the trivial "**" case. Then, we split the pattern into three segments. Forward and backward scans find possible start and end positions for left and right. For each forward start of left, we find the earliest right that can form a valid substring covering middle. The minimum length is tracked throughout.

Go Solution

func shortestMatchingSubstring(s string, p string) int {
    if p == "**" {
        return 0
    }

    stars := []int{}
    for i, c := range p {
        if c == '*' {
            stars = append(stars, i)
        }
    }

    left, middle, right := p[:stars[0]], p[stars[0]+1:stars[1]], p[stars[1]+1:]
    n := len(s)

    leftPos := []int{}
    for i := 0; i <= n-len(left); i++ {
        if s[i:i+len(left)] == left {
            leftPos = append(leftPos, i)
        }
    }

    rightPos := []int{}
    for i := n - len(right); i >= 0; i-- {
        if s[i:i+len(right)] == right {
            rightPos = append(rightPos, i)
        }
    }
    for i, j := 0, len(rightPos)-1; i < j; i, j = i+1, j-1 {
        rightPos[i], rightPos[j] = rightPos[j], rightPos[i]
    }

    minLen := n + 1
    for _, start := range leftPos {
        midStart := start + len(left)
        for _, end := range rightPos {
            if end >= midStart {
                totalLen := end + len(right) - start
                if totalLen < minLen {
                    minLen = totalLen
                }
                break
            }
        }
    }

    if minLen == n+1 {
        return -1
    }
    return minLen
}

Go differences: Slices are used instead of lists, and reversing rightPos is done manually. Otherwise, the logic mirrors Python. Edge cases and substring comparisons are handled identically.

Worked Examples

Example 1: s = "abaacbaecebce", p = "ba*c*ce"

  • left = "ba", middle = "c", right = "ce"
  • Forward scan finds left at positions 1 and 5.
  • Backward scan finds right at positions 9 and 11.
  • Alignment: start=5, end=12 → substring "baecebce" → length 8.

Example 2: s = "baccbaadbc", p = "cc*baa*adb"

  • left = "cc", middle = "baa", right = "adb"
  • Forward scan finds left at position 1.
  • Backward scan finds right not existing → return -1.

Example 3: s = "a", p = "**"

  • Trivial wildcard → return 0.

Example 4: s = "madlogic", p = "*adlogi*"

  • left = "", middle = "adlogi", right = ""
  • Forward scan finds middle at index 1 → substring "adlogi" → length 6.

Complexity Analysis

Measure Complexity Explanation
Time O(n * m) Forward and backward scans check