LeetCode 3859 - Count Subarrays With K Distinct Integers

This problem asks us to count subarrays of a given integer array nums such that each subarray contains exactly k distinct integers, and each distinct integer in the subarray appears at least m times.

LeetCode Problem 3859

Difficulty: 🔴 Hard
Topics: Array, Hash Table, Sliding Window, Counting

Solution

Problem Understanding

This problem asks us to count subarrays of a given integer array nums such that each subarray contains exactly k distinct integers, and each distinct integer in the subarray appears at least m times. In other words, for a subarray to be valid, it must satisfy two constraints: a cardinality constraint (exactly k distinct numbers) and a frequency constraint (each distinct number appears ≥ m times).

The input nums is an array of integers, with a length up to 10^5. Each element is also up to 10^5. The integers k and m are positive and bounded by the array length. The output is a single integer representing the number of valid subarrays.

Important edge cases include arrays where k > length of nums or m is larger than the frequency of any element, as these make valid subarrays impossible. Another subtle case is when nums contains repeated elements, because a naive approach might miss subarrays that overlap or fail to count them multiple times if needed.

Approaches

The brute-force approach would involve generating all possible subarrays and counting the distinct integers and their frequencies for each subarray. While this guarantees correctness, it is infeasible because the number of subarrays of an array of length n is O(n^2), and checking the frequency for each subarray adds another O(n) in the worst case. This leads to an O(n^3) algorithm, which will not work for n up to 10^5.

The key insight for an optimal approach is that we can use a sliding window with a frequency hashmap to efficiently maintain the count of distinct integers and their frequencies in the current window. By carefully expanding and contracting the window, we can count subarrays satisfying exactly k distinct numbers, all appearing at least m times, without generating every subarray explicitly. The problem can be reduced to counting windows where the frequency condition is satisfied, leveraging two pointers and a hashmap to track counts.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(n) Generate all subarrays, count distinct numbers and frequencies for each
Optimal Sliding Window O(n) amortized O(n) Use two pointers and hashmap to maintain counts; count valid subarrays efficiently

Algorithm Walkthrough

  1. Initialize two pointers left and right to represent the sliding window. Create a hashmap freq to store the count of each integer in the current window, and a variable distinctValid to track how many integers currently satisfy the ≥ m frequency requirement.
  2. Move the right pointer through the array, adding nums[right] to freq. If adding this element makes its frequency exactly m, increment distinctValid because this integer now satisfies the frequency requirement.
  3. While the window contains more than k distinct integers, increment left to shrink the window. If removing nums[left] drops its frequency below m, decrement distinctValid.
  4. Whenever the window has exactly k distinct integers and all of them appear at least m times (distinctValid == k), count the number of valid subarrays ending at right. All subarrays starting from left up to the current leftBound are valid because increasing left further would violate the distinct count or frequency condition.
  5. Continue expanding right and adjusting left until the end of the array. Sum up the valid subarrays found.

Why it works: The sliding window maintains the invariant that all elements inside satisfy the minimum frequency condition and that distinct counts are tracked. By counting subarrays dynamically as the window expands, we avoid recomputing frequencies for overlapping windows, ensuring correctness and efficiency.

Python Solution

class Solution:
    def countSubarrays(self, nums: list[int], k: int, m: int) -> int:
        from collections import defaultdict

        freq = defaultdict(int)
        left = 0
        count = 0
        distinctValid = 0
        n = len(nums)

        for right in range(n):
            freq[nums[right]] += 1
            if freq[nums[right]] == m:
                distinctValid += 1

            while len(freq) > k:
                if freq[nums[left]] == m:
                    distinctValid -= 1
                freq[nums[left]] -= 1
                if freq[nums[left]] == 0:
                    del freq[nums[left]]
                left += 1

            # Check if current window is valid
            if len(freq) == k and distinctValid == k:
                tempLeft = left
                while tempLeft <= right and all(v >= m for v in freq.values()):
                    count += 1
                    freq[nums[tempLeft]] -= 1
                    if freq[nums[tempLeft]] == m - 1:
                        distinctValid -= 1
                    if freq[nums[tempLeft]] == 0:
                        del freq[nums[tempLeft]]
                    tempLeft += 1

                # Restore freq and distinctValid
                for i in range(left, tempLeft):
                    freq[nums[i]] += 1
                    if freq[nums[i]] == m:
                        distinctValid += 1

        return count

The Python implementation maintains a sliding window with two pointers. freq tracks frequencies, and distinctValid ensures each distinct integer meets the minimum frequency. When a valid window is found, the inner loop counts all subarrays by shrinking from the left while still satisfying the frequency requirement, then restores the state for the next iteration.

Go Solution

func countSubarrays(nums []int, k int, m int) int64 {
    freq := make(map[int]int)
    var count int64
    left := 0
    distinctValid := 0

    for right := 0; right < len(nums); right++ {
        freq[nums[right]]++
        if freq[nums[right]] == m {
            distinctValid++
        }

        for len(freq) > k {
            if freq[nums[left]] == m {
                distinctValid--
            }
            freq[nums[left]]--
            if freq[nums[left]] == 0 {
                delete(freq, nums[left])
            }
            left++
        }

        if len(freq) == k && distinctValid == k {
            tempLeft := left
            for tempLeft <= right {
                valid := true
                for _, v := range freq {
                    if v < m {
                        valid = false
                        break
                    }
                }
                if !valid {
                    break
                }
                count++
                if freq[nums[tempLeft]] == m {
                    distinctValid--
                }
                freq[nums[tempLeft]]--
                if freq[nums[tempLeft]] == 0 {
                    delete(freq, nums[tempLeft])
                }
                tempLeft++
            }

            for i := left; i < tempLeft; i++ {
                freq[nums[i]]++
                if freq[nums[i]] == m {
                    distinctValid++
                }
            }
        }
    }

    return count
}

The Go version mirrors the Python logic. Go requires explicit deletion from the map and careful handling of integer types to avoid overflow. Slice indexing is straightforward and maps replace Python's defaultdict.

Worked Examples

Example 1: nums = [1,2,1,2,2], k = 2, m = 2

Right Window freq distinctValid Valid Subarrays Added
0 [1] {1:1} 0 0
1 [1,2] {1:1,2:1} 0 0
2 [1,2,1] {1:2,2:1} 1 0
3 [1,2,1,2] {1:2,2:2} 2 +1
4 [1,2,1,2,2] {1:2,2:3} 2 +1

Total count = 2

Example 2: nums = [3,1,2,4], k = 2, m = 1

Right Window freq distinctValid Valid Subarrays Added
0 [3] {3:1} 1 0
1 [3,1] {3:1,1:1} 2 +1
2 [1,2] {1:1,2:1} 2 +1
3 [2,4] {2:1,4:1} 2 +1

Total count = 3

Complexity Analysis

Measure Complexity Explanation
Time O(n) amortized Each element is added and removed at most twice from the sliding window and frequency map
Space O(n) Hashmap stores counts of elements in the current window; at worst all elements are distinct

The amortized time is O(n)