LeetCode 3793 - Find Users with High Token Usage

Difficulty: 🟢 Easy
Topics: —

Solution

Problem Understanding

The prompts table records every prompt submitted by a user to an AI system. Each row contains a user_id, the prompt text itself, and the number of tokens consumed by that prompt.

The task is to analyze prompt usage on a per-user basis and return only users who satisfy two separate conditions.

First, the user must have submitted at least three prompts. This means we need to count how many rows belong to each user and filter out users with fewer than three submissions.

Second, the user must have at least one prompt whose token count is strictly greater than that user's own average token usage. The comparison is not against a global average, but against the average calculated only from that user's prompts.

For every qualifying user, we must output:

user_id
prompt_count, the total number of prompts submitted
avg_tokens, the average number of tokens per prompt rounded to two decimal places

The final result must be sorted by:

avg_tokens in descending order
user_id in ascending order when averages are equal

The input is a relational table rather than an array or graph. This is a SQL aggregation problem where we need to compute per-user statistics and then filter users based on those statistics.

An important observation is that if a user has multiple prompts and not all prompt token values are identical, then at least one prompt must be greater than the average. Conversely, if every prompt has exactly the same token count, then no prompt is greater than the average. The query therefore needs to verify the existence of a prompt with tokens > average_tokens_for_that_user.

Important edge cases include users with exactly three prompts, users whose prompts all have identical token counts, users with only one or two prompts, and users whose average is a non-integer value requiring rounding in the output.

Approaches

Brute Force Approach

A brute force solution would process each user independently. For every distinct user, we could scan the entire table to collect that user's prompts, calculate the count and average, then scan the user's prompts again to determine whether any prompt exceeds the average.

This approach is correct because it explicitly computes all required statistics and directly checks the filtering condition. However, it repeatedly scans the table for each user. If there are U users and N rows, the total work can approach O(U × N), which becomes inefficient as the dataset grows.

Key Insight

The key observation is that all required information is aggregated at the user level.

We need:

Count of prompts per user
Average tokens per user
Whether a maximum token value exceeds the average

Instead of repeatedly scanning the table, we can group rows by user_id once. SQL aggregation functions such as COUNT(), AVG(), and MAX() naturally compute these values.

The condition "at least one prompt greater than the user's average" can be expressed efficiently as:

MAX(tokens) > AVG(tokens)

If the maximum token count is greater than the average, then at least one prompt exceeds the average. If all prompts are equal, then MAX(tokens) = AVG(tokens) and the user should be excluded.

This allows the entire problem to be solved using a single grouping operation.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(U × N)	O(1)	Repeatedly scans rows for each user
Optimal	O(N)	O(U)	Single grouping pass using aggregate functions

Algorithm Walkthrough

Group all rows by user_id.
For each user group, compute:

COUNT(*) as the total number of prompts.
AVG(tokens) as the average token usage.
MAX(tokens) as the largest token count submitted by that user.

Apply the first filter using a HAVING clause:

Keep only users where COUNT(*) >= 3.

Apply the second filter:

Keep only users where MAX(tokens) > AVG(tokens).
This guarantees that at least one prompt has a token count larger than the user's average.

Round the average token count to two decimal places using ROUND().
Sort the remaining rows by:

Average tokens descending.
User ID ascending.

Return the resulting table.

Why it works

The algorithm works because grouping by user_id collects all prompts belonging to the same user into a single aggregate record. COUNT(*) correctly measures prompt frequency, AVG(tokens) computes the user's average token usage, and MAX(tokens) identifies the largest prompt token count.

A user satisfies the second condition exactly when there exists some prompt whose token count exceeds the average. Since the largest token value is the greatest candidate, checking MAX(tokens) > AVG(tokens) is both necessary and sufficient.

Python Solution

LeetCode Database problems are solved using SQL rather than Python. The equivalent SQL solution is:

# This problem is a SQL database problem.
# No Python implementation is required on LeetCode.

The platform expects a SQL query because the input is stored in a relational table. The solution relies on SQL aggregation functions and a HAVING clause rather than procedural code.

Go Solution

LeetCode Database problems are solved using SQL rather than Go. The equivalent SQL solution is:

// This problem is a SQL database problem.
// No Go implementation is required on LeetCode.

Since the judge executes SQL directly against the provided tables, there is no Go function signature or implementation for this problem.

SQL Solution

SELECT
    user_id,
    COUNT(*) AS prompt_count,
    ROUND(AVG(tokens), 2) AS avg_tokens
FROM prompts
GROUP BY user_id
HAVING COUNT(*) >= 3
   AND MAX(tokens) > AVG(tokens)
ORDER BY avg_tokens DESC, user_id ASC;

Implementation Explanation

The query groups all records by user_id, allowing aggregate statistics to be calculated for each user.

COUNT(*) computes the total number of prompts submitted by the user. This value is returned as prompt_count.

AVG(tokens) computes the user's average token usage. The result is rounded to two decimal places using ROUND(AVG(tokens), 2).

The HAVING clause filters aggregated groups. COUNT(*) >= 3 enforces the minimum prompt requirement. MAX(tokens) > AVG(tokens) guarantees that at least one prompt exceeds the user's average token usage.

Finally, the result is sorted according to the required ordering: descending average tokens and ascending user ID.

Worked Examples

Example 1

Input:

user_id	tokens
1	120
1	80
1	200
2	60
2	70
3	300
3	250
3	180
3	220

Aggregation Results

user_id	count	avg	max
1	3	133.3333	200
2	2	65.0	70
3	4	237.5	300

Apply Filters

For User 1:

Condition	Result
count >= 3	True
max > avg	200 > 133.33, True

User 1 remains.

For User 2:

Condition	Result
count >= 3	False

User 2 is removed.

For User 3:

Condition	Result
count >= 3	True
max > avg	300 > 237.5, True

User 3 remains.

Final Output

user_id	prompt_count	avg_tokens
3	4	237.50
1	3	133.33

The ordering is determined by descending average token usage.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(N)	Each row participates in a single grouping operation
Space	O(U)	Aggregated information is stored per user

The query performs one aggregation pass over the N rows in the table. Database engines typically maintain aggregate state for each distinct user, resulting in storage proportional to the number of users U.

Test Cases

Since this is a SQL problem, the following examples illustrate expected behavior.

# Example 1
# User 1 qualifies.
# User 2 has fewer than 3 prompts.
# User 3 qualifies.

# All prompts equal
# user_id=1 -> tokens [100,100,100]
# avg=100, max=100
# max is not greater than avg
# Should be excluded.

# Exactly 3 prompts
# user_id=2 -> tokens [10,20,30]
# avg=20, max=30
# Should be included.

# Fewer than 3 prompts
# user_id=3 -> tokens [100,200]
# Should be excluded.

# Single user with non-integer average
# user_id=4 -> tokens [100,101,102]
# avg=101.00
# max=102
# Should be included.

# Multiple users with equal averages
# Verify secondary ordering by user_id ascending.

Test Summary

Test	Why
Problem example	Validates the basic requirements
All prompts equal	Ensures `MAX(tokens) > AVG(tokens)` is enforced
Exactly three prompts	Validates boundary condition for inclusion
Fewer than three prompts	Validates minimum prompt count filter
Non-integer average	Verifies rounding behavior
Equal averages across users	Verifies secondary sorting by user ID

Edge Cases

Users With Identical Token Counts

A user might submit three or more prompts where every prompt uses the same number of tokens. For example, [100, 100, 100] produces an average of 100 and a maximum of 100. Since no prompt is strictly greater than the average, the user must be excluded. The condition MAX(tokens) > AVG(tokens) correctly handles this scenario.

Users With Exactly Three Prompts

The requirement states that users must have submitted at least three prompts. A common mistake is accidentally requiring more than three prompts. The condition COUNT(*) >= 3 correctly includes users with exactly three submissions.

Averages Requiring Rounding

Token averages may contain many decimal places. For example, [120, 80, 200] produces 133.3333.... The problem requires displaying averages rounded to two decimal places. Using ROUND(AVG(tokens), 2) guarantees the expected output format.

Different users may have identical average token usage. In such cases, sorting only by average is insufficient. The query includes user_id ASC as a secondary ordering criterion, ensuring deterministic and correct output ordering.