LeetCode 3564 - Seasonal Sales Analysis
This problem asks us to analyze sales data and determine the most popular product category for each season based on the total quantity sold.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem asks us to analyze sales data and determine the most popular product category for each season based on the total quantity sold. The input consists of two tables: sales, which records individual transactions including product ID, date, quantity, and price; and products, which maps each product ID to its name and category. The expected output is a table with one row per season containing the season name, the most popular category, the total quantity sold for that category, and the total revenue generated.
Popularity is defined first by the total quantity sold. In the event of a tie in quantity, the category with higher total revenue (quantity × price) is chosen. If there is still a tie, the lexicographically smaller category is selected. The seasons are defined as: Winter (Dec-Feb), Spring (Mar-May), Summer (Jun-Aug), and Fall (Sep-Nov). The output must be ordered by season in ascending order: Fall, Spring, Summer, Winter.
Edge cases to consider include ties in total quantity, ties in both quantity and revenue, products with zero sales in a season, and ensuring correct handling of year boundaries (e.g., December and January spanning two different years). The problem guarantees that sale_id and product_id are unique, which simplifies joins and aggregations.
Approaches
The brute-force approach would iterate over all sales, group them by season and category, calculate total quantities and revenue for each group, and then determine the maximum manually for each season. This approach works correctly but becomes cumbersome and inefficient as the dataset grows, since it requires multiple nested passes and repeated aggregation operations.
The optimal approach leverages SQL aggregation and window functions to efficiently compute totals and rank categories within each season. The key insight is that we can map each month to its season, aggregate by season and category to compute total quantity and revenue, and then use a ranking function to select the top category per season while automatically handling ties by revenue and lexicographical order. This approach is efficient because all calculations are done in a single query pass without repeated scans of the data.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N×C) | O(C×S) | Iterate over each sale and manually compute totals per season-category pair, then determine maximums. |
| Optimal | O(N) | O(C×S) | Aggregate sales by season and category, then use window functions to rank and select the top category per season. |
Algorithm Walkthrough
- Join the
salestable with theproductstable onproduct_idto associate each sale with its category. - Convert the
sale_dateto its corresponding season. Map months 12, 1, 2 to Winter; 3, 4, 5 to Spring; 6, 7, 8 to Summer; 9, 10, 11 to Fall. - Aggregate the sales by season and category, computing
total_quantityas the sum ofquantityandtotal_revenueas the sum ofquantity × price. - Use a window function to rank categories within each season. Rank by
total_quantitydescending, then bytotal_revenuedescending, then bycategoryascending to resolve ties. - Select the top-ranked category (rank = 1) for each season.
- Order the result by season according to the ascending order Fall, Spring, Summer, Winter.
Why it works: By aggregating sales per season and category first, we capture all necessary metrics to evaluate popularity. The ranking ensures ties are resolved according to the problem requirements. Selecting only the top-ranked row per season guarantees that the most popular category is returned for each season in the correct order.
Python Solution
import pandas as pd
def seasonal_sales_analysis(sales: pd.DataFrame, products: pd.DataFrame) -> pd.DataFrame:
# Join sales with products to get category
df = sales.merge(products, on="product_id")
# Map month to season
def month_to_season(month: int) -> str:
if month in [12, 1, 2]:
return "Winter"
elif month in [3, 4, 5]:
return "Spring"
elif month in [6, 7, 8]:
return "Summer"
else:
return "Fall"
df['season'] = df['sale_date'].dt.month.apply(month_to_season)
# Aggregate totals per season-category
agg = df.groupby(['season', 'category'], as_index=False).agg(
total_quantity=('quantity', 'sum'),
total_revenue=('price', lambda x: (x * df.loc[x.index, 'quantity']).sum())
)
# Rank categories per season
agg['rank'] = agg.groupby('season') \
.apply(lambda x: x.sort_values(
by=['total_quantity', 'total_revenue', 'category'],
ascending=[False, False, True]
).reset_index(drop=True).reset_index().rename(columns={'index':'rank'})['rank']) \
.reset_index(drop=True)
# Select top category per season
top_categories = agg[agg['rank'] == 0].copy()
# Order seasons
season_order = ['Fall', 'Spring', 'Summer', 'Winter']
top_categories['season'] = pd.Categorical(top_categories['season'], categories=season_order, ordered=True)
return top_categories[['season', 'category', 'total_quantity', 'total_revenue']].sort_values('season')
Implementation walkthrough: We first join sales with products to get the category for each sale. The month_to_season function converts sale dates to seasons. We aggregate total quantity and total revenue per season-category combination. Using groupby and sorting, we rank the categories within each season to handle ties. Finally, we select the top-ranked category for each season and order by the predefined season order.
Go Solution
package main
import (
"database/sql"
"fmt"
)
func SeasonalSalesAnalysis(db *sql.DB) (*sql.Rows, error) {
query := `
WITH sales_with_category AS (
SELECT s.sale_id, s.product_id, s.sale_date, s.quantity, s.price, p.category,
CASE
WHEN MONTH(s.sale_date) IN (12,1,2) THEN 'Winter'
WHEN MONTH(s.sale_date) IN (3,4,5) THEN 'Spring'
WHEN MONTH(s.sale_date) IN (6,7,8) THEN 'Summer'
ELSE 'Fall'
END AS season
FROM sales s
JOIN products p ON s.product_id = p.product_id
),
agg AS (
SELECT season, category, SUM(quantity) AS total_quantity, SUM(quantity * price) AS total_revenue
FROM sales_with_category
GROUP BY season, category
),
ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY season
ORDER BY total_quantity DESC, total_revenue DESC, category ASC
) AS rank
FROM agg
)
SELECT season, category, total_quantity, total_revenue
FROM ranked
WHERE rank = 1
ORDER BY FIELD(season, 'Fall', 'Spring', 'Summer', 'Winter');
`
return db.Query(query)
}
Go-specific notes: The Go version uses SQL directly via database/sql. The ROW_NUMBER() window function handles ranking, and FIELD ensures seasons are ordered correctly. Since Go interacts with the database, we return *sql.Rows to allow iteration over the result set. Overflow and nil values are naturally handled by SQL.
Worked Examples
Fall
Aggregated data:
| category | total_quantity | total_revenue |
|---|---|---|
| Apparel | 10 | 120 |
| Fitness | 3 | 36 |
Most popular: Apparel (highest quantity)
Spring
| category | total_quantity | total_revenue |
|---|---|---|
| Kitchen | 3 | 54 |
| Tech | 1 | 20 |
| Apparel | 2 | 20 |
Most popular: Kitchen (highest quantity)
Summer
| category | total_quantity | total_revenue |
|---|---|---|
| Tech | 5 | 100 |
| Fitness | 5 | 60 |
| Apparel | 4 | 60 |
Most popular: Tech (tie on quantity but higher revenue)
Winter
| category | total_quantity | total_revenue |
|---|---|---|
| Apparel | 9 | 110 |
| Kitchen | 6 | 108 |
| Tech | 3 | 60 |
| Fitness | 2 | 24 |
Most popular: Apparel (highest quantity)
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | We scan all sales once to join and aggregate, and ranking within seasons is negligible compared to N. |
| Space | O(C×S) | We store aggregated data per category per season. |
Since the number of seasons is constant (4) and the number of categories is generally much smaller than the number of sales, the space requirement is efficient.
Test Cases
import pandas as pd
# Provided example
sales = pd.DataFrame({
'sale_id':[1,2,3,4,5,6,7,8,9,10,11