LeetCode 3601 - Find Drivers with Improved Fuel Efficiency

This problem asks us to analyze fuel efficiency trends for drivers across two different periods of the year. We are given two tables: - drivers, which stores each driver's identifier and name.

LeetCode Problem 3601

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem asks us to analyze fuel efficiency trends for drivers across two different periods of the year.

We are given two tables:

  • drivers, which stores each driver's identifier and name.
  • trips, which stores individual trip records, including distance traveled and fuel consumed.

For every trip, fuel efficiency is defined as:

$$\text{fuel efficiency} = \frac{\text{distance_km}}{\text{fuel_consumed}}$$

The year is divided into two halves:

  • First half: January through June
  • Second half: July through December

For each driver, we must compute:

  1. The average fuel efficiency across all trips in the first half.
  2. The average fuel efficiency across all trips in the second half.
  3. The efficiency improvement:

$$\text{efficiency improvement} = \text{second_half_avg} - \text{first_half_avg}$$

Only drivers who have at least one trip in both halves should be included in the final result.

The output must contain:

  • driver_id
  • driver_name
  • first_half_avg
  • second_half_avg
  • efficiency_improvement

All numeric values must be rounded to two decimal places.

The final result must be sorted by:

  1. efficiency_improvement descending
  2. driver_name ascending

The key detail is that we are averaging per-trip efficiencies, not dividing total distance by total fuel consumption for each half. Each trip contributes equally to the average regardless of distance.

Important Edge Cases

A driver may have trips only in the first half or only in the second half. Such drivers must be excluded.

A driver may have multiple trips in either half. Every trip's efficiency must be included in the corresponding average.

A driver may have exactly one trip in a half. In that case, the average equals that trip's efficiency.

The improvement can be positive, zero, or negative. The problem statement asks for drivers with improved fuel efficiency, but the SQL solution naturally computes and filters drivers with data in both halves, then orders by improvement. The expected solution returns the calculated improvement value.

Approaches

Brute Force Approach

The most direct approach is to process each driver independently.

For every driver, scan the entire trips table and collect:

  • All first-half trip efficiencies.
  • All second-half trip efficiencies.

After gathering those values, compute both averages and the improvement.

This approach is correct because every driver's result is computed from all of their trips. However, it repeatedly scans the entire trips table for every driver.

If there are D drivers and T trips, the complexity becomes:

$$O(D \times T)$$

which is unnecessarily expensive.

Optimal Approach

The key observation is that each trip only needs to be processed once.

We can first compute each trip's efficiency and classify it as either:

  • First half
  • Second half

Then, using conditional aggregation, we accumulate the averages for every driver in a single pass through the trips data.

SQL aggregation naturally groups trips by driver and computes:

  • First-half average efficiency
  • Second-half average efficiency

Drivers who do not have trips in both halves can be filtered using a HAVING clause.

This reduces the work to a single grouped aggregation over the trips table.

Approach Time Complexity Space Complexity Notes
Brute Force O(D × T) O(T) Scan all trips separately for each driver
Optimal O(T) O(D) Single aggregation grouped by driver

Algorithm Walkthrough

  1. Join the drivers table with the trips table so that every trip is associated with its driver information.
  2. For each trip, calculate its fuel efficiency using:

$$\frac{\text{distance_km}}{\text{fuel_consumed}}$$ 3. Determine whether the trip belongs to the first half or second half of the year by extracting the month from trip_date. 4. Use conditional aggregation to compute the first-half average efficiency:

AVG(
    CASE
        WHEN MONTH(trip_date) BETWEEN 1 AND 6
        THEN distance_km / fuel_consumed
    END
)
  1. Use another conditional aggregation to compute the second-half average efficiency:
AVG(
    CASE
        WHEN MONTH(trip_date) BETWEEN 7 AND 12
        THEN distance_km / fuel_consumed
    END
)
  1. Group all records by driver so that each driver produces a single aggregated row.
  2. Use a HAVING clause to ensure the driver has at least one trip in both halves.
  3. Compute the efficiency improvement as:
second_half_avg - first_half_avg
  1. Round all displayed numeric values to two decimal places.
  2. Sort the final result by improvement descending and driver name ascending.

Why it Works

The correctness follows from SQL's grouping semantics. Every trip is assigned to exactly one half of the year. Conditional aggregation computes the average efficiency using only trips belonging to the corresponding half. The HAVING clause guarantees that both averages are based on at least one trip. Therefore, the calculated improvement exactly matches the problem definition.

Python Solution

For LeetCode Database problems, the expected answer is an SQL query rather than executable Python code.

# Database problem
# The accepted submission is SQL, not Python code.

The platform executes an SQL query against the provided tables. No Python implementation is required because the problem belongs to the Database category.

Go Solution

Similarly, Database problems on LeetCode are solved using SQL rather than Go.

// Database problem
// The accepted submission is SQL, not Go code.

There are no Go-specific implementation details because the solution is expressed entirely in SQL.

SQL Solution

SELECT
    d.driver_id,
    d.driver_name,
    ROUND(
        AVG(
            CASE
                WHEN MONTH(t.trip_date) BETWEEN 1 AND 6
                THEN t.distance_km / t.fuel_consumed
            END
        ),
        2
    ) AS first_half_avg,
    ROUND(
        AVG(
            CASE
                WHEN MONTH(t.trip_date) BETWEEN 7 AND 12
                THEN t.distance_km / t.fuel_consumed
            END
        ),
        2
    ) AS second_half_avg,
    ROUND(
        AVG(
            CASE
                WHEN MONTH(t.trip_date) BETWEEN 7 AND 12
                THEN t.distance_km / t.fuel_consumed
            END
        )
        -
        AVG(
            CASE
                WHEN MONTH(t.trip_date) BETWEEN 1 AND 6
                THEN t.distance_km / t.fuel_consumed
            END
        ),
        2
    ) AS efficiency_improvement
FROM drivers d
JOIN trips t
    ON d.driver_id = t.driver_id
GROUP BY
    d.driver_id,
    d.driver_name
HAVING
    COUNT(
        CASE
            WHEN MONTH(t.trip_date) BETWEEN 1 AND 6
            THEN 1
        END
    ) > 0
    AND
    COUNT(
        CASE
            WHEN MONTH(t.trip_date) BETWEEN 7 AND 12
            THEN 1
        END
    ) > 0
ORDER BY
    efficiency_improvement DESC,
    driver_name ASC;

Implementation Explanation

The query begins by joining drivers and trips so that every trip can be associated with the corresponding driver name.

The first-half average is computed using a conditional AVG. Trips outside January through June return NULL, and AVG automatically ignores NULL values.

The same technique is used to compute the second-half average.

The improvement is calculated by subtracting the first-half average from the second-half average.

The HAVING clause ensures that both halves contain at least one trip. Without this condition, one of the averages would be NULL.

Finally, all numeric results are rounded to two decimal places and sorted according to the required ordering.

Worked Examples

Example 1

Input trips for Alice Johnson:

Trip Month Distance Fuel Efficiency
1 February 120.5 10.2 11.81
2 March 200.0 16.5 12.12
3 August 150.0 11.0 13.64
4 September 180.0 12.5 14.40

First-half aggregation:

Efficiency Values
11.81
12.12

$$\frac{11.81 + 12.12}{2} = 11.97$$

Second-half aggregation:

Efficiency Values
13.64
14.40

$$\frac{13.64 + 14.40}{2} = 14.02$$

Improvement:

$$14.02 - 11.97 = 2.05$$

Example 2

Input trips for Bob Smith:

Trip Month Distance Fuel Efficiency
5 January 100.0 9.0 11.11
6 April 250.0 22.0 11.36
7 October 200.0 15.0 13.33

First-half average:

$$\frac{11.11 + 11.36}{2} = 11.24$$

Second-half average:

$$13.33$$

Improvement:

$$13.33 - 11.24 = 2.09$$

Rounded:

$$2.10$$

Excluded Drivers

Driver First Half Trips Second Half Trips Included
Carol Davis Yes No No
David Wilson No Yes No
Emma Brown Yes No No

These drivers fail the requirement of having trips in both halves of the year.

Complexity Analysis

Measure Complexity Explanation
Time O(T) Each trip participates in one grouped aggregation
Space O(D) Aggregation state is maintained per driver

The database performs a single scan of the relevant trip records and maintains aggregate statistics for each driver group. Therefore the running time is linear in the number of trip rows, while memory usage is proportional to the number of distinct drivers being aggregated.

Test Cases

Since this is a SQL database problem, the following datasets can be used to validate correctness.

# Example from statement
# Driver 1 and Driver 2 should appear.
# Drivers 3, 4, and 5 should be excluded.

# Single trip in each half
# first_half_avg = trip1 efficiency
# second_half_avg = trip2 efficiency

# Multiple trips per half
# Verify averages are computed from per-trip efficiencies.

# Driver only in first half
# Should be excluded.

# Driver only in second half
# Should be excluded.

# Zero improvement
# first_half_avg == second_half_avg

# Negative improvement
# second_half_avg < first_half_avg

# Multiple drivers with same improvement
# Verify secondary ordering by driver_name ascending.

# Boundary month test
# June should belong to first half.
# July should belong to second half.

# Large number of trips
# Verify aggregation scales correctly.
Test Why
Provided example Validates the main scenario
One trip per half Tests minimum valid inclusion
Multiple trips per half Tests averaging logic
First-half-only driver Tests exclusion rule
Second-half-only driver Tests exclusion rule
Equal averages Tests zero improvement
Lower second-half average Tests negative improvement
Same improvement values Tests tie-breaking order
June and July trips Tests half-year classification
Large dataset Tests aggregation scalability

Edge Cases

Driver Has Trips Only in One Half

A common mistake is allowing drivers with a missing average on one side. Such drivers do not satisfy the requirement of comparison across both halves. The HAVING clause explicitly checks that each driver has at least one trip in January-June and at least one trip in July-December.

Driver Has Exactly One Trip in a Half

It is easy to incorrectly assume multiple trips exist in every period. The aggregation functions naturally handle this case. The average of a single value is that value itself, so no special logic is required.

Boundary Months June and July

Month classification bugs are common. June must belong to the first half and July must belong to the second half. Using BETWEEN 1 AND 6 and BETWEEN 7 AND 12 ensures that every month belongs to exactly one half with no overlap and no gaps.

Multiple Trips with Different Distances

A subtle bug would be computing:

$$\frac{\text{total distance}}{\text{total fuel}}$$

instead of averaging individual trip efficiencies. The problem explicitly defines efficiency per trip and then asks for the average fuel efficiency. The solution computes distance_km / fuel_consumed for each trip first and only then applies AVG, exactly matching the required definition.