You are working as a Data Engineer for an e-commerce company. The product team wants to identify the best-performing products within each category based on revenue. You are given two tables: products product_id (INTEGER): Unique identifier for each product product_name (TEXT): Name of the product category (TEXT): Product category order_items order_item_id (INTEGER): Unique identifier for each order item product_id (INTEGER): ID of the product quantity (INTEGER): Number of units sold price (NUMERIC): Price per unit Task: Write a SQL query to find the top 2 products in each category based on total revenue . Revenue for a product is calculated as: quantity * price Expected Output: category product_id product_name total_revenue The result should include only the top 2 products per category and be sorted by: category (ascending) total_revenue (descending)
Table Setup (SQL Schema)
-- DDL: Create tables
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
product_name TEXT,
category TEXT
);
CREATE TABLE order_items (
order_item_id INTEGER PRIMARY KEY,
product_id INTEGER,
quantity INTEGER,
price NUMERIC(10,2)
);
-- DML: Insert sample data
INSERT INTO products (product_id, product_name, category) VALUES
(1, 'iPhone 14', 'Electronics'),
(2, 'Samsung Galaxy S22', 'Electronics'),
(3, 'MacBook Pro', 'Electronics'),
(4, 'Nike Shoes', 'Fashion'),
(5, 'Adidas T-Shirt', 'Fashion'),
(6, 'Levi Jeans', 'Fashion');
INSERT INTO order_items (order_item_id, product_id, quantity, price) VALUES
(1, 1, 2, 800.00),
(2, 2, 3, 700.00),
(3, 3, 1, 1500.00),
(4, 4, 5, 100.00),
(5, 5, 4, 50.00),
(6, 6, 2, 120.00),
(7, 1, 1, 800.00),
(8, 2, 1, 700.00),
(9, 4, 2, 100.00),
(10, 5, 3, 50.00);