LeetCode 3926 - Count Valid Word Occurrences
The problem asks us to count occurrences of specific words in a string that is formed by concatenating an array of smaller string "chunks".
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Counting
Solution
Problem Understanding
The problem asks us to count occurrences of specific words in a string that is formed by concatenating an array of smaller string "chunks". The twist is that words are defined in a non-standard way: they can include joiner hyphens, which are hyphens that appear between two lowercase English letters. Any other character, including spaces or non-joiner hyphens, acts as a separator.
The inputs are chunks, a list of strings that may contain letters, spaces, and hyphens, and queries, a list of words whose occurrences we need to count. Each query is guaranteed to be a valid word: it starts and ends with a lowercase letter and does not contain consecutive hyphens. The output is an array of integers corresponding to the count of each query word in the fully concatenated string s.
Key constraints indicate that the combined size of chunks and queries can be up to $10^5$ characters, so any solution that iterates inefficiently over all substrings will be too slow. Important edge cases include hyphens at the edges of strings, consecutive hyphens, and spaces.
Approaches
A brute-force approach would concatenate all chunks into a single string, split it using spaces or hyphens as separators, and then count exact matches for each query. The challenge here is correctly identifying which hyphens are joiner hyphens. This approach works correctly but may be inefficient because it requires parsing the entire string multiple times and storing all words.
The optimal approach is to scan the concatenated string once, building words on the fly and checking for joiner hyphens. We maintain a dictionary (hash map) that counts the frequency of each word. Once all words are counted, we simply look up each query in the dictionary to produce the result. This method avoids repeatedly scanning the string and ensures an efficient solution.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m) | O(n) | Concatenate and split string, then count matches per query |
| Optimal | O(n + q) | O(n) | Single scan to build word frequencies, then query in O(1) per query |
Where $n$ is the total length of all chunks and $q$ is the number of queries.
Algorithm Walkthrough
-
Concatenate all strings in
chunksto forms. -
Initialize an empty dictionary
word_countto keep track of word frequencies. -
Initialize an empty string
current_wordto accumulate letters and joiner hyphens. -
Iterate through each character in
s: -
If the character is a lowercase letter, append it to
current_word. -
If the character is a hyphen
-, check if it is a joiner hyphen by verifying that both the previous and next characters are lowercase letters. If so, append it tocurrent_word. Otherwise, finalizecurrent_wordas a complete word and reset it. -
For any other separator (space, invalid hyphen), finalize
current_wordas a complete word and reset it. -
After the iteration, ensure the last
current_word(if non-empty) is added toword_count. -
For each query, retrieve its count from
word_count(defaulting to 0 if not present) and append to the result array. -
Return the result array.
Why it works: The algorithm guarantees that all words are identified according to the problem definition. Each character is examined exactly once, and each word is counted correctly in a hash map, allowing efficient lookups for queries.
Python Solution
class Solution:
def countWordOccurrences(self, chunks: list[str], queries: list[str]) -> list[int]:
s = ''.join(chunks)
word_count = {}
n = len(s)
i = 0
while i < n:
if s[i].islower():
start = i
while i < n:
if s[i].islower():
i += 1
elif s[i] == '-' and i > start and i + 1 < n and s[i-1].islower() and s[i+1].islower():
i += 1
else:
break
word = s[start:i]
word_count[word] = word_count.get(word, 0) + 1
else:
i += 1
return [word_count.get(q, 0) for q in queries]
The implementation starts by concatenating chunks to simplify processing. A while loop scans each character, carefully handling joiner hyphens by checking adjacent characters. Each word found is counted in word_count. Finally, we generate the result array by looking up each query's frequency.
Go Solution
func countWordOccurrences(chunks []string, queries []string) []int {
s := ""
for _, chunk := range chunks {
s += chunk
}
wordCount := make(map[string]int)
n := len(s)
i := 0
for i < n {
if s[i] >= 'a' && s[i] <= 'z' {
start := i
for i < n {
if s[i] >= 'a' && s[i] <= 'z' {
i++
} else if s[i] == '-' && i > start && i+1 < n && s[i-1] >= 'a' && s[i-1] <= 'z' && s[i+1] >= 'a' && s[i+1] <= 'z' {
i++
} else {
break
}
}
word := s[start:i]
wordCount[word]++
} else {
i++
}
}
result := make([]int, len(queries))
for idx, q := range queries {
result[idx] = wordCount[q]
}
return result
}
In Go, we handle strings as byte slices implicitly. The logic mirrors Python: a single pass counts all valid words, respecting joiner hyphens, and the final result array is built by looking up each query in the map.
Worked Examples
Example 1:
chunks = ["hello wor","ld hello"] → s = "hello world hello"
Scanning produces words: "hello", "world", "hello"
Word count dictionary: {"hello": 2, "world": 1}
Queries: ["hello","world","wor"] → result [2, 1, 0]
Example 2:
chunks = ["a-b a--b ","a-","b"] → s = "a-b a--b a-b"
Scanning produces words: "a-b", "a", "b", "a-b"
Word count dictionary: {"a-b": 2, "a": 1, "b": 1}
Queries: ["a-b","a","b"] → result [2, 1, 1]
Example 3:
chunks = ["-cat dog- mouse"] → s = "-cat dog- mouse"
Scanning produces words: "cat", "dog", "mouse"
Word count dictionary: {"cat": 1, "dog": 1, "mouse": 1}
Queries: ["cat","dog","mouse","cat-dog"] → result [1, 1, 1, 0]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + q) | Single pass through concatenated string of length n, plus O(1) lookup per query for q queries |
| Space | O(n) | Storing all words in a hash map may require space proportional to n |
The algorithm efficiently handles large input sizes because each character is examined exactly once, and dictionary lookups are amortized O(1).
Test Cases
# Provided examples
assert Solution().countWordOccurrences(["hello wor","ld hello"], ["hello","world","wor"]) == [2,1,0]
assert Solution().countWordOccurrences(["a-b a--b ","a-","b"], ["a-b","a","b"]) == [2,1,1]
assert Solution().countWordOccurrences(["-cat dog- mouse"], ["cat","dog","mouse","cat-dog"]) == [1,1,1,0]
# Edge case: single chunk, single word
assert Solution().countWordOccurrences(["abc"], ["abc"]) == [1]
# Edge case: hyphens at edges
assert Solution().countWordOccurrences(["-a-b-"], ["a-b","a","b"]) == [1,0,0]
# Multiple joiner hyphens
assert Solution().countWordOccurrences(["x-y-z"], ["x-y-z","x-y","y-z","x","y","z"]) == [1,0,0,0,0,0]
# Spaces and multiple separators
assert Solution().countWordOccurrences(["a b--c d-e "], ["a","b","c","d-e"]) == [1,1,1,1]
| Test | Why |
|---|---|
| Single chunk, single word | Minimal input |
| Hyphens at edges | Ensures non-joiner hyphens are correctly ignored |
| Multiple joiner hyphens |