LeetCode 3934 - Smallest Unique Subarray
The problem asks us to find the smallest possible length of a contiguous subarray such that there exists at least one subarray of that length that appears exactly once in the entire array.
Difficulty: 🔴 Hard
Topics: Array, Hash Table, Binary Search, Rolling Hash, Suffix Array, Hash Function
Solution
Problem Understanding
The problem asks us to find the smallest possible length of a contiguous subarray such that there exists at least one subarray of that length that appears exactly once in the entire array.
In other words, for a fixed length L, we look at every subarray of size L and count how many times each distinct sequence occurs. If at least one of those sequences occurs exactly once, then L is considered valid. Our goal is to find the minimum such valid L.
The input is a single integer array nums of size up to 10^5, and each element is an integer up to 10^5. The output is a single integer representing the smallest length satisfying the uniqueness condition.
The constraints imply that we cannot afford naive substring enumeration with deep comparisons. A naive O(n^3) approach would be far too slow, and even O(n^2) per length would be too large. This strongly suggests that we need hashing, prefix preprocessing, or a binary search optimization.
Edge cases include arrays of size 1, arrays where all elements are identical, and arrays where all elements are distinct. These cases test whether the solution correctly handles maximal repetition and maximal uniqueness scenarios.
Approaches
Brute Force Approach
The brute force approach considers every possible subarray length L from 1 to n. For each L, it generates all subarrays of that length, compares them pairwise, and counts occurrences. If any subarray appears exactly once, we record L as a candidate answer.
This works because it directly enforces the definition of the problem, but it is too slow. There are O(n) lengths, O(n) subarrays per length, and comparing subarrays costs O(L), leading to a cubic worst-case complexity.
Optimal Approach: Binary Search + Rolling Hash
The key insight is that we can efficiently test whether a given length L is valid by checking if any subarray of length L appears exactly once. We can compute hashes for all subarrays of length L in O(n) using a rolling hash and store their frequencies in a hash map.
We then use binary search over L because the property is monotonic in practice for this problem: if there exists a unique subarray of length L, then increasing the length generally reduces repetition, making uniqueness easier to achieve. Thus, we can find the minimum valid L efficiently.
To avoid recomputing subarray values, we use prefix rolling hashes so each subarray hash can be computed in O(1).
Complexity Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^3) | O(1) | Compare all subarrays directly |
| Optimal (Binary Search + Rolling Hash) | O(n log n) | O(n) | Efficient frequency counting using hashing |
Algorithm Walkthrough
- We define a helper function
exists_unique(L)that determines whether there is at least one subarray of lengthLthat appears exactly once. This function is the core check used by binary search. - Inside
exists_unique(L), we compute prefix hashes of the array using a polynomial rolling hash. This allows us to compute any subarray hash in constant time, which is critical for efficiency. - We iterate over all subarrays of length
Lusing a sliding window. For each window, we compute its hash and store its frequency in a hash map. - After processing all subarrays of length
L, we scan the frequency map to check if any hash has frequency exactly equal to 1. If such a hash exists, we return true. - We perform binary search on
Lfrom 1 ton. For each midpointmid, we checkexists_unique(mid). If true, we attempt smaller lengths; otherwise, we increase the length. - The final answer is the smallest
Lfor whichexists_unique(L)returns true.
Why it works
The correctness relies on the fact that subarray uniqueness becomes easier to achieve as length increases because longer sequences have fewer opportunities to repeat identically. Combined with exact frequency counting using hashing, this ensures that once a valid length is found, we can safely search for smaller valid lengths without missing the true minimum.
Python Solution
from typing import List
from collections import defaultdict
class Solution:
def smallestUniqueSubarray(self, nums: List[int]) -> int:
n = len(nums)
base = 91138233
mod = 10**9 + 7
prefix = [0] * (n + 1)
power = [1] * (n + 1)
for i in range(n):
prefix[i + 1] = (prefix[i] * base + nums[i]) % mod
power[i + 1] = (power[i] * base) % mod
def get_hash(l: int, r: int) -> int:
return (prefix[r] - prefix[l] * power[r - l]) % mod
def exists_unique(L: int) -> bool:
freq = defaultdict(int)
for i in range(n - L + 1):
h = get_hash(i, i + L)
freq[h] += 1
for v in freq.values():
if v == 1:
return True
return False
left, right = 1, n
answer = n
while left <= right:
mid = (left + right) // 2
if exists_unique(mid):
answer = mid
right = mid - 1
else:
left = mid + 1
return answer
Implementation Explanation
We first build prefix hashes and power arrays to allow O(1) subarray hash computation. The get_hash function extracts a normalized hash for any subarray.
The exists_unique function computes frequencies of all subarrays of a fixed length L. It uses a hash map to count occurrences and then checks if any frequency equals 1.
Finally, binary search is used to minimize L, repeatedly refining the search space based on whether a valid unique subarray exists.
Go Solution
package main
func smallestUniqueSubarray(nums []int) int {
n := len(nums)
base := int64(91138233)
mod := int64(1000000007)
prefix := make([]int64, n+1)
power := make([]int64, n+1)
power[0] = 1
for i := 0; i < n; i++ {
prefix[i+1] = (prefix[i]*base + int64(nums[i])) % mod
power[i+1] = (power[i] * base) % mod
}
getHash := func(l, r int) int64 {
val := (prefix[r] - prefix[l]*power[r-l]) % mod
if val < 0 {
val += mod
}
return val
}
existsUnique := func(L int) bool {
freq := make(map[int64]int)
for i := 0; i+L <= n; i++ {
h := getHash(i, i+L)
freq[h]++
}
for _, v := range freq {
if v == 1 {
return true
}
}
return false
}
left, right := 1, n
ans := n
for left <= right {
mid := (left + right) / 2
if existsUnique(mid) {
ans = mid
right = mid - 1
} else {
left = mid + 1
}
}
return ans
}
Go-Specific Notes
Go requires explicit handling of modular arithmetic underflow, so we normalize negative hash values. We also use map[int64]int for frequency counting instead of a Python dictionary. Slice initialization for prefix and power arrays is explicit, and integer types are fixed-width int64 to avoid overflow during multiplication.
Worked Examples
Example 1: nums = [3,3,3]
For L = 1, subarrays are [3] occurring 3 times, so none are unique. For L = 2, [3,3] occurs twice, so none are unique. For L = 3, [3,3,3] occurs once, so L = 3 is valid and returned.
| L | Subarrays | Frequencies | Unique Exists |
|---|---|---|---|
| 1 | [3],[3],[3] | 3 | No |
| 2 | [3,3],[3,3] | 2 | No |
| 3 | [3,3,3] | 1 | Yes |
Example 2: nums = [2,1,2,3,3]
For L = 1, subarrays are [2],[1],[2],[3],[3]. The value [1] occurs once, so L = 1 is immediately valid.
Example 3: nums = [1,1,2,2,1]
For L = 1, no value occurs exactly once. For L = 2, all subarrays [1,1],[1,2],[2,2],[2,1] occur once each, so L = 2 is valid.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Binary search over L, each check scans all subarrays using O(1) hashing |
| Space | O(n) | Prefix arrays and hash map for frequency counting |
The logarithmic factor comes from binary search over possible lengths, and each feasibility check runs in linear time over the array using precomputed rolling hashes.
Test Cases
assert Solution().smallestUniqueSubarray([3,3,3]) == 3 # all duplicates until full length
assert Solution().smallestUniqueSubarray([2,1,2,3,3]) == 1 # single unique element exists
assert Solution().smallestUniqueSubarray([1,1,2,2,1]) == 2 # first unique subarrays appear at length 2
assert Solution().smallestUniqueSubarray([1]) == 1 # single element
assert Solution().smallestUniqueSubarray([1,2,3,4]) == 1 # all elements unique
assert Solution().smallestUniqueSubarray([5,5,5,5]) == 4 # only full array unique
assert Solution().smallestUniqueSubarray([1,2,1,2,1]) == 2 # mixed repetition
| Test | Why |
|---|---|
| [3,3,3] | tests full repetition |
| [2,1,2,3,3] | early unique element |
| [1,1,2,2,1] | uniqueness emerges at L=2 |
| [1] | minimal size array |
| [1,2,3,4] | all singletons |
| [5,5,5,5] | uniqueness only at max length |
| [1,2,1,2,1] | overlapping patterns |
Edge Cases
One important edge case is when the array has length 1. In this case, the only subarray is the array itself, which is trivially unique, so the answer is always 1. The implementation correctly handles this because the binary search range starts at 1 and immediately succeeds.
Another edge case is when all elements are identical. In this scenario, no subarray of length less than n will be unique because every subarray of a given length is identical to others. The algorithm correctly increases L until it reaches n.
A third edge case is when all elements are distinct. Here, every subarray of length 1 is unique, so the answer should be 1. The frequency check correctly identifies singleton occurrences at the smallest length.
A final subtle case involves repeated patterns such as alternating sequences. These can produce multiple overlapping identical subarrays, and correctness depends entirely on accurate hashing and frequency counting, which the rolling hash ensures.