LeetCode 3564 - Seasonal Sales Analysis

This problem asks us to analyze sales data and determine the most popular product category for each season based on the total quantity sold.

LeetCode Problem 3564

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem asks us to analyze sales data and determine the most popular product category for each season based on the total quantity sold. The input consists of two tables: sales, which records individual transactions including product ID, date, quantity, and price; and products, which maps each product ID to its name and category. The expected output is a table with one row per season containing the season name, the most popular category, the total quantity sold for that category, and the total revenue generated.

Popularity is defined first by the total quantity sold. In the event of a tie in quantity, the category with higher total revenue (quantity × price) is chosen. If there is still a tie, the lexicographically smaller category is selected. The seasons are defined as: Winter (Dec-Feb), Spring (Mar-May), Summer (Jun-Aug), and Fall (Sep-Nov). The output must be ordered by season in ascending order: Fall, Spring, Summer, Winter.

Edge cases to consider include ties in total quantity, ties in both quantity and revenue, products with zero sales in a season, and ensuring correct handling of year boundaries (e.g., December and January spanning two different years). The problem guarantees that sale_id and product_id are unique, which simplifies joins and aggregations.

Approaches

The brute-force approach would iterate over all sales, group them by season and category, calculate total quantities and revenue for each group, and then determine the maximum manually for each season. This approach works correctly but becomes cumbersome and inefficient as the dataset grows, since it requires multiple nested passes and repeated aggregation operations.

The optimal approach leverages SQL aggregation and window functions to efficiently compute totals and rank categories within each season. The key insight is that we can map each month to its season, aggregate by season and category to compute total quantity and revenue, and then use a ranking function to select the top category per season while automatically handling ties by revenue and lexicographical order. This approach is efficient because all calculations are done in a single query pass without repeated scans of the data.

Approach Time Complexity Space Complexity Notes
Brute Force O(N×C) O(C×S) Iterate over each sale and manually compute totals per season-category pair, then determine maximums.
Optimal O(N) O(C×S) Aggregate sales by season and category, then use window functions to rank and select the top category per season.

Algorithm Walkthrough

  1. Join the sales table with the products table on product_id to associate each sale with its category.
  2. Convert the sale_date to its corresponding season. Map months 12, 1, 2 to Winter; 3, 4, 5 to Spring; 6, 7, 8 to Summer; 9, 10, 11 to Fall.
  3. Aggregate the sales by season and category, computing total_quantity as the sum of quantity and total_revenue as the sum of quantity × price.
  4. Use a window function to rank categories within each season. Rank by total_quantity descending, then by total_revenue descending, then by category ascending to resolve ties.
  5. Select the top-ranked category (rank = 1) for each season.
  6. Order the result by season according to the ascending order Fall, Spring, Summer, Winter.

Why it works: By aggregating sales per season and category first, we capture all necessary metrics to evaluate popularity. The ranking ensures ties are resolved according to the problem requirements. Selecting only the top-ranked row per season guarantees that the most popular category is returned for each season in the correct order.

Python Solution

import pandas as pd

def seasonal_sales_analysis(sales: pd.DataFrame, products: pd.DataFrame) -> pd.DataFrame:
    # Join sales with products to get category
    df = sales.merge(products, on="product_id")
    
    # Map month to season
    def month_to_season(month: int) -> str:
        if month in [12, 1, 2]:
            return "Winter"
        elif month in [3, 4, 5]:
            return "Spring"
        elif month in [6, 7, 8]:
            return "Summer"
        else:
            return "Fall"
    
    df['season'] = df['sale_date'].dt.month.apply(month_to_season)
    
    # Aggregate totals per season-category
    agg = df.groupby(['season', 'category'], as_index=False).agg(
        total_quantity=('quantity', 'sum'),
        total_revenue=('price', lambda x: (x * df.loc[x.index, 'quantity']).sum())
    )
    
    # Rank categories per season
    agg['rank'] = agg.groupby('season') \
        .apply(lambda x: x.sort_values(
            by=['total_quantity', 'total_revenue', 'category'],
            ascending=[False, False, True]
        ).reset_index(drop=True).reset_index().rename(columns={'index':'rank'})['rank']) \
        .reset_index(drop=True)
    
    # Select top category per season
    top_categories = agg[agg['rank'] == 0].copy()
    
    # Order seasons
    season_order = ['Fall', 'Spring', 'Summer', 'Winter']
    top_categories['season'] = pd.Categorical(top_categories['season'], categories=season_order, ordered=True)
    
    return top_categories[['season', 'category', 'total_quantity', 'total_revenue']].sort_values('season')

Implementation walkthrough: We first join sales with products to get the category for each sale. The month_to_season function converts sale dates to seasons. We aggregate total quantity and total revenue per season-category combination. Using groupby and sorting, we rank the categories within each season to handle ties. Finally, we select the top-ranked category for each season and order by the predefined season order.

Go Solution

package main

import (
    "database/sql"
    "fmt"
)

func SeasonalSalesAnalysis(db *sql.DB) (*sql.Rows, error) {
    query := `
    WITH sales_with_category AS (
        SELECT s.sale_id, s.product_id, s.sale_date, s.quantity, s.price, p.category,
        CASE
            WHEN MONTH(s.sale_date) IN (12,1,2) THEN 'Winter'
            WHEN MONTH(s.sale_date) IN (3,4,5) THEN 'Spring'
            WHEN MONTH(s.sale_date) IN (6,7,8) THEN 'Summer'
            ELSE 'Fall'
        END AS season
        FROM sales s
        JOIN products p ON s.product_id = p.product_id
    ),
    agg AS (
        SELECT season, category, SUM(quantity) AS total_quantity, SUM(quantity * price) AS total_revenue
        FROM sales_with_category
        GROUP BY season, category
    ),
    ranked AS (
        SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY season
            ORDER BY total_quantity DESC, total_revenue DESC, category ASC
        ) AS rank
        FROM agg
    )
    SELECT season, category, total_quantity, total_revenue
    FROM ranked
    WHERE rank = 1
    ORDER BY FIELD(season, 'Fall', 'Spring', 'Summer', 'Winter');
    `
    return db.Query(query)
}

Go-specific notes: The Go version uses SQL directly via database/sql. The ROW_NUMBER() window function handles ranking, and FIELD ensures seasons are ordered correctly. Since Go interacts with the database, we return *sql.Rows to allow iteration over the result set. Overflow and nil values are naturally handled by SQL.

Worked Examples

Fall

Aggregated data:

category total_quantity total_revenue
Apparel 10 120
Fitness 3 36

Most popular: Apparel (highest quantity)

Spring

category total_quantity total_revenue
Kitchen 3 54
Tech 1 20
Apparel 2 20

Most popular: Kitchen (highest quantity)

Summer

category total_quantity total_revenue
Tech 5 100
Fitness 5 60
Apparel 4 60

Most popular: Tech (tie on quantity but higher revenue)

Winter

category total_quantity total_revenue
Apparel 9 110
Kitchen 6 108
Tech 3 60
Fitness 2 24

Most popular: Apparel (highest quantity)

Complexity Analysis

Measure Complexity Explanation
Time O(N) We scan all sales once to join and aggregate, and ranking within seasons is negligible compared to N.
Space O(C×S) We store aggregated data per category per season.

Since the number of seasons is constant (4) and the number of categories is generally much smaller than the number of sales, the space requirement is efficient.

Test Cases

import pandas as pd

# Provided example
sales = pd.DataFrame({
    'sale_id':[1,2,3,4,5,6,7,8,9,10,11