LeetCode 3601 - Find Drivers with Improved Fuel Efficiency
This problem asks us to analyze fuel efficiency trends for drivers across two different periods of the year. We are given two tables: - drivers, which stores each driver's identifier and name.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem asks us to analyze fuel efficiency trends for drivers across two different periods of the year.
We are given two tables:
drivers, which stores each driver's identifier and name.trips, which stores individual trip records, including distance traveled and fuel consumed.
For every trip, fuel efficiency is defined as:
$$\text{fuel efficiency} = \frac{\text{distance_km}}{\text{fuel_consumed}}$$
The year is divided into two halves:
- First half: January through June
- Second half: July through December
For each driver, we must compute:
- The average fuel efficiency across all trips in the first half.
- The average fuel efficiency across all trips in the second half.
- The efficiency improvement:
$$\text{efficiency improvement} = \text{second_half_avg} - \text{first_half_avg}$$
Only drivers who have at least one trip in both halves should be included in the final result.
The output must contain:
driver_iddriver_namefirst_half_avgsecond_half_avgefficiency_improvement
All numeric values must be rounded to two decimal places.
The final result must be sorted by:
efficiency_improvementdescendingdriver_nameascending
The key detail is that we are averaging per-trip efficiencies, not dividing total distance by total fuel consumption for each half. Each trip contributes equally to the average regardless of distance.
Important Edge Cases
A driver may have trips only in the first half or only in the second half. Such drivers must be excluded.
A driver may have multiple trips in either half. Every trip's efficiency must be included in the corresponding average.
A driver may have exactly one trip in a half. In that case, the average equals that trip's efficiency.
The improvement can be positive, zero, or negative. The problem statement asks for drivers with improved fuel efficiency, but the SQL solution naturally computes and filters drivers with data in both halves, then orders by improvement. The expected solution returns the calculated improvement value.
Approaches
Brute Force Approach
The most direct approach is to process each driver independently.
For every driver, scan the entire trips table and collect:
- All first-half trip efficiencies.
- All second-half trip efficiencies.
After gathering those values, compute both averages and the improvement.
This approach is correct because every driver's result is computed from all of their trips. However, it repeatedly scans the entire trips table for every driver.
If there are D drivers and T trips, the complexity becomes:
$$O(D \times T)$$
which is unnecessarily expensive.
Optimal Approach
The key observation is that each trip only needs to be processed once.
We can first compute each trip's efficiency and classify it as either:
- First half
- Second half
Then, using conditional aggregation, we accumulate the averages for every driver in a single pass through the trips data.
SQL aggregation naturally groups trips by driver and computes:
- First-half average efficiency
- Second-half average efficiency
Drivers who do not have trips in both halves can be filtered using a HAVING clause.
This reduces the work to a single grouped aggregation over the trips table.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(D × T) | O(T) | Scan all trips separately for each driver |
| Optimal | O(T) | O(D) | Single aggregation grouped by driver |
Algorithm Walkthrough
- Join the
driverstable with thetripstable so that every trip is associated with its driver information. - For each trip, calculate its fuel efficiency using:
$$\frac{\text{distance_km}}{\text{fuel_consumed}}$$
3. Determine whether the trip belongs to the first half or second half of the year by extracting the month from trip_date.
4. Use conditional aggregation to compute the first-half average efficiency:
AVG(
CASE
WHEN MONTH(trip_date) BETWEEN 1 AND 6
THEN distance_km / fuel_consumed
END
)
- Use another conditional aggregation to compute the second-half average efficiency:
AVG(
CASE
WHEN MONTH(trip_date) BETWEEN 7 AND 12
THEN distance_km / fuel_consumed
END
)
- Group all records by driver so that each driver produces a single aggregated row.
- Use a
HAVINGclause to ensure the driver has at least one trip in both halves. - Compute the efficiency improvement as:
second_half_avg - first_half_avg
- Round all displayed numeric values to two decimal places.
- Sort the final result by improvement descending and driver name ascending.
Why it Works
The correctness follows from SQL's grouping semantics. Every trip is assigned to exactly one half of the year. Conditional aggregation computes the average efficiency using only trips belonging to the corresponding half. The HAVING clause guarantees that both averages are based on at least one trip. Therefore, the calculated improvement exactly matches the problem definition.
Python Solution
For LeetCode Database problems, the expected answer is an SQL query rather than executable Python code.
# Database problem
# The accepted submission is SQL, not Python code.
The platform executes an SQL query against the provided tables. No Python implementation is required because the problem belongs to the Database category.
Go Solution
Similarly, Database problems on LeetCode are solved using SQL rather than Go.
// Database problem
// The accepted submission is SQL, not Go code.
There are no Go-specific implementation details because the solution is expressed entirely in SQL.
SQL Solution
SELECT
d.driver_id,
d.driver_name,
ROUND(
AVG(
CASE
WHEN MONTH(t.trip_date) BETWEEN 1 AND 6
THEN t.distance_km / t.fuel_consumed
END
),
2
) AS first_half_avg,
ROUND(
AVG(
CASE
WHEN MONTH(t.trip_date) BETWEEN 7 AND 12
THEN t.distance_km / t.fuel_consumed
END
),
2
) AS second_half_avg,
ROUND(
AVG(
CASE
WHEN MONTH(t.trip_date) BETWEEN 7 AND 12
THEN t.distance_km / t.fuel_consumed
END
)
-
AVG(
CASE
WHEN MONTH(t.trip_date) BETWEEN 1 AND 6
THEN t.distance_km / t.fuel_consumed
END
),
2
) AS efficiency_improvement
FROM drivers d
JOIN trips t
ON d.driver_id = t.driver_id
GROUP BY
d.driver_id,
d.driver_name
HAVING
COUNT(
CASE
WHEN MONTH(t.trip_date) BETWEEN 1 AND 6
THEN 1
END
) > 0
AND
COUNT(
CASE
WHEN MONTH(t.trip_date) BETWEEN 7 AND 12
THEN 1
END
) > 0
ORDER BY
efficiency_improvement DESC,
driver_name ASC;
Implementation Explanation
The query begins by joining drivers and trips so that every trip can be associated with the corresponding driver name.
The first-half average is computed using a conditional AVG. Trips outside January through June return NULL, and AVG automatically ignores NULL values.
The same technique is used to compute the second-half average.
The improvement is calculated by subtracting the first-half average from the second-half average.
The HAVING clause ensures that both halves contain at least one trip. Without this condition, one of the averages would be NULL.
Finally, all numeric results are rounded to two decimal places and sorted according to the required ordering.
Worked Examples
Example 1
Input trips for Alice Johnson:
| Trip | Month | Distance | Fuel | Efficiency |
|---|---|---|---|---|
| 1 | February | 120.5 | 10.2 | 11.81 |
| 2 | March | 200.0 | 16.5 | 12.12 |
| 3 | August | 150.0 | 11.0 | 13.64 |
| 4 | September | 180.0 | 12.5 | 14.40 |
First-half aggregation:
| Efficiency Values |
|---|
| 11.81 |
| 12.12 |
$$\frac{11.81 + 12.12}{2} = 11.97$$
Second-half aggregation:
| Efficiency Values |
|---|
| 13.64 |
| 14.40 |
$$\frac{13.64 + 14.40}{2} = 14.02$$
Improvement:
$$14.02 - 11.97 = 2.05$$
Example 2
Input trips for Bob Smith:
| Trip | Month | Distance | Fuel | Efficiency |
|---|---|---|---|---|
| 5 | January | 100.0 | 9.0 | 11.11 |
| 6 | April | 250.0 | 22.0 | 11.36 |
| 7 | October | 200.0 | 15.0 | 13.33 |
First-half average:
$$\frac{11.11 + 11.36}{2} = 11.24$$
Second-half average:
$$13.33$$
Improvement:
$$13.33 - 11.24 = 2.09$$
Rounded:
$$2.10$$
Excluded Drivers
| Driver | First Half Trips | Second Half Trips | Included |
|---|---|---|---|
| Carol Davis | Yes | No | No |
| David Wilson | No | Yes | No |
| Emma Brown | Yes | No | No |
These drivers fail the requirement of having trips in both halves of the year.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(T) | Each trip participates in one grouped aggregation |
| Space | O(D) | Aggregation state is maintained per driver |
The database performs a single scan of the relevant trip records and maintains aggregate statistics for each driver group. Therefore the running time is linear in the number of trip rows, while memory usage is proportional to the number of distinct drivers being aggregated.
Test Cases
Since this is a SQL database problem, the following datasets can be used to validate correctness.
# Example from statement
# Driver 1 and Driver 2 should appear.
# Drivers 3, 4, and 5 should be excluded.
# Single trip in each half
# first_half_avg = trip1 efficiency
# second_half_avg = trip2 efficiency
# Multiple trips per half
# Verify averages are computed from per-trip efficiencies.
# Driver only in first half
# Should be excluded.
# Driver only in second half
# Should be excluded.
# Zero improvement
# first_half_avg == second_half_avg
# Negative improvement
# second_half_avg < first_half_avg
# Multiple drivers with same improvement
# Verify secondary ordering by driver_name ascending.
# Boundary month test
# June should belong to first half.
# July should belong to second half.
# Large number of trips
# Verify aggregation scales correctly.
| Test | Why |
|---|---|
| Provided example | Validates the main scenario |
| One trip per half | Tests minimum valid inclusion |
| Multiple trips per half | Tests averaging logic |
| First-half-only driver | Tests exclusion rule |
| Second-half-only driver | Tests exclusion rule |
| Equal averages | Tests zero improvement |
| Lower second-half average | Tests negative improvement |
| Same improvement values | Tests tie-breaking order |
| June and July trips | Tests half-year classification |
| Large dataset | Tests aggregation scalability |
Edge Cases
Driver Has Trips Only in One Half
A common mistake is allowing drivers with a missing average on one side. Such drivers do not satisfy the requirement of comparison across both halves. The HAVING clause explicitly checks that each driver has at least one trip in January-June and at least one trip in July-December.
Driver Has Exactly One Trip in a Half
It is easy to incorrectly assume multiple trips exist in every period. The aggregation functions naturally handle this case. The average of a single value is that value itself, so no special logic is required.
Boundary Months June and July
Month classification bugs are common. June must belong to the first half and July must belong to the second half. Using BETWEEN 1 AND 6 and BETWEEN 7 AND 12 ensures that every month belongs to exactly one half with no overlap and no gaps.
Multiple Trips with Different Distances
A subtle bug would be computing:
$$\frac{\text{total distance}}{\text{total fuel}}$$
instead of averaging individual trip efficiencies. The problem explicitly defines efficiency per trip and then asks for the average fuel efficiency. The solution computes distance_km / fuel_consumed for each trip first and only then applies AVG, exactly matching the required definition.