LeetCode 3793 - Find Users with High Token Usage
The prompts table records every prompt submitted by a user to an AI system. Each row contains a userid, the prompt text itself, and the number of tokens consumed by that prompt.
Difficulty: 🟢 Easy
Topics: —
Solution
Problem Understanding
The prompts table records every prompt submitted by a user to an AI system. Each row contains a user_id, the prompt text itself, and the number of tokens consumed by that prompt.
The task is to analyze prompt usage on a per-user basis and return only users who satisfy two separate conditions.
First, the user must have submitted at least three prompts. This means we need to count how many rows belong to each user and filter out users with fewer than three submissions.
Second, the user must have at least one prompt whose token count is strictly greater than that user's own average token usage. The comparison is not against a global average, but against the average calculated only from that user's prompts.
For every qualifying user, we must output:
user_idprompt_count, the total number of prompts submittedavg_tokens, the average number of tokens per prompt rounded to two decimal places
The final result must be sorted by:
avg_tokensin descending orderuser_idin ascending order when averages are equal
The input is a relational table rather than an array or graph. This is a SQL aggregation problem where we need to compute per-user statistics and then filter users based on those statistics.
An important observation is that if a user has multiple prompts and not all prompt token values are identical, then at least one prompt must be greater than the average. Conversely, if every prompt has exactly the same token count, then no prompt is greater than the average. The query therefore needs to verify the existence of a prompt with tokens > average_tokens_for_that_user.
Important edge cases include users with exactly three prompts, users whose prompts all have identical token counts, users with only one or two prompts, and users whose average is a non-integer value requiring rounding in the output.
Approaches
Brute Force Approach
A brute force solution would process each user independently. For every distinct user, we could scan the entire table to collect that user's prompts, calculate the count and average, then scan the user's prompts again to determine whether any prompt exceeds the average.
This approach is correct because it explicitly computes all required statistics and directly checks the filtering condition. However, it repeatedly scans the table for each user. If there are U users and N rows, the total work can approach O(U × N), which becomes inefficient as the dataset grows.
Key Insight
The key observation is that all required information is aggregated at the user level.
We need:
- Count of prompts per user
- Average tokens per user
- Whether a maximum token value exceeds the average
Instead of repeatedly scanning the table, we can group rows by user_id once. SQL aggregation functions such as COUNT(), AVG(), and MAX() naturally compute these values.
The condition "at least one prompt greater than the user's average" can be expressed efficiently as:
MAX(tokens) > AVG(tokens)
If the maximum token count is greater than the average, then at least one prompt exceeds the average. If all prompts are equal, then MAX(tokens) = AVG(tokens) and the user should be excluded.
This allows the entire problem to be solved using a single grouping operation.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(U × N) | O(1) | Repeatedly scans rows for each user |
| Optimal | O(N) | O(U) | Single grouping pass using aggregate functions |
Algorithm Walkthrough
- Group all rows by
user_id. - For each user group, compute:
COUNT(*)as the total number of prompts.AVG(tokens)as the average token usage.MAX(tokens)as the largest token count submitted by that user.
- Apply the first filter using a
HAVINGclause:
- Keep only users where
COUNT(*) >= 3.
- Apply the second filter:
- Keep only users where
MAX(tokens) > AVG(tokens). - This guarantees that at least one prompt has a token count larger than the user's average.
- Round the average token count to two decimal places using
ROUND(). - Sort the remaining rows by:
- Average tokens descending.
- User ID ascending.
- Return the resulting table.
Why it works
The algorithm works because grouping by user_id collects all prompts belonging to the same user into a single aggregate record. COUNT(*) correctly measures prompt frequency, AVG(tokens) computes the user's average token usage, and MAX(tokens) identifies the largest prompt token count.
A user satisfies the second condition exactly when there exists some prompt whose token count exceeds the average. Since the largest token value is the greatest candidate, checking MAX(tokens) > AVG(tokens) is both necessary and sufficient.
Python Solution
LeetCode Database problems are solved using SQL rather than Python. The equivalent SQL solution is:
# This problem is a SQL database problem.
# No Python implementation is required on LeetCode.
The platform expects a SQL query because the input is stored in a relational table. The solution relies on SQL aggregation functions and a HAVING clause rather than procedural code.
Go Solution
LeetCode Database problems are solved using SQL rather than Go. The equivalent SQL solution is:
// This problem is a SQL database problem.
// No Go implementation is required on LeetCode.
Since the judge executes SQL directly against the provided tables, there is no Go function signature or implementation for this problem.
SQL Solution
SELECT
user_id,
COUNT(*) AS prompt_count,
ROUND(AVG(tokens), 2) AS avg_tokens
FROM prompts
GROUP BY user_id
HAVING COUNT(*) >= 3
AND MAX(tokens) > AVG(tokens)
ORDER BY avg_tokens DESC, user_id ASC;
Implementation Explanation
The query groups all records by user_id, allowing aggregate statistics to be calculated for each user.
COUNT(*) computes the total number of prompts submitted by the user. This value is returned as prompt_count.
AVG(tokens) computes the user's average token usage. The result is rounded to two decimal places using ROUND(AVG(tokens), 2).
The HAVING clause filters aggregated groups. COUNT(*) >= 3 enforces the minimum prompt requirement. MAX(tokens) > AVG(tokens) guarantees that at least one prompt exceeds the user's average token usage.
Finally, the result is sorted according to the required ordering: descending average tokens and ascending user ID.
Worked Examples
Example 1
Input:
| user_id | tokens |
|---|---|
| 1 | 120 |
| 1 | 80 |
| 1 | 200 |
| 2 | 60 |
| 2 | 70 |
| 3 | 300 |
| 3 | 250 |
| 3 | 180 |
| 3 | 220 |
Aggregation Results
| user_id | count | avg | max |
|---|---|---|---|
| 1 | 3 | 133.3333 | 200 |
| 2 | 2 | 65.0 | 70 |
| 3 | 4 | 237.5 | 300 |
Apply Filters
For User 1:
| Condition | Result |
|---|---|
| count >= 3 | True |
| max > avg | 200 > 133.33, True |
User 1 remains.
For User 2:
| Condition | Result |
|---|---|
| count >= 3 | False |
User 2 is removed.
For User 3:
| Condition | Result |
|---|---|
| count >= 3 | True |
| max > avg | 300 > 237.5, True |
User 3 remains.
Final Output
| user_id | prompt_count | avg_tokens |
|---|---|---|
| 3 | 4 | 237.50 |
| 1 | 3 | 133.33 |
The ordering is determined by descending average token usage.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | Each row participates in a single grouping operation |
| Space | O(U) | Aggregated information is stored per user |
The query performs one aggregation pass over the N rows in the table. Database engines typically maintain aggregate state for each distinct user, resulting in storage proportional to the number of users U.
Test Cases
Since this is a SQL problem, the following examples illustrate expected behavior.
# Example 1
# User 1 qualifies.
# User 2 has fewer than 3 prompts.
# User 3 qualifies.
# All prompts equal
# user_id=1 -> tokens [100,100,100]
# avg=100, max=100
# max is not greater than avg
# Should be excluded.
# Exactly 3 prompts
# user_id=2 -> tokens [10,20,30]
# avg=20, max=30
# Should be included.
# Fewer than 3 prompts
# user_id=3 -> tokens [100,200]
# Should be excluded.
# Single user with non-integer average
# user_id=4 -> tokens [100,101,102]
# avg=101.00
# max=102
# Should be included.
# Multiple users with equal averages
# Verify secondary ordering by user_id ascending.
Test Summary
| Test | Why |
|---|---|
| Problem example | Validates the basic requirements |
| All prompts equal | Ensures MAX(tokens) > AVG(tokens) is enforced |
| Exactly three prompts | Validates boundary condition for inclusion |
| Fewer than three prompts | Validates minimum prompt count filter |
| Non-integer average | Verifies rounding behavior |
| Equal averages across users | Verifies secondary sorting by user ID |
Edge Cases
Users With Identical Token Counts
A user might submit three or more prompts where every prompt uses the same number of tokens. For example, [100, 100, 100] produces an average of 100 and a maximum of 100. Since no prompt is strictly greater than the average, the user must be excluded. The condition MAX(tokens) > AVG(tokens) correctly handles this scenario.
Users With Exactly Three Prompts
The requirement states that users must have submitted at least three prompts. A common mistake is accidentally requiring more than three prompts. The condition COUNT(*) >= 3 correctly includes users with exactly three submissions.
Averages Requiring Rounding
Token averages may contain many decimal places. For example, [120, 80, 200] produces 133.3333.... The problem requires displaying averages rounded to two decimal places. Using ROUND(AVG(tokens), 2) guarantees the expected output format.
Multiple Users Sharing the Same Average
Different users may have identical average token usage. In such cases, sorting only by average is insufficient. The query includes user_id ASC as a secondary ordering criterion, ensuring deterministic and correct output ordering.