Retrieving and Analyzing data with SQL:�The Look E-Commerce Dataset
By: Muhammad Farhan Abdillah
REVOU ASSIGNMENT
Introduction
With The Look Ecommerce Dataset in Google BigQuery, I answer the questions below with SQL:
INTERMEDIATE ASSIGNMENT:
1. Create a query to get the total users who completed the order and total orders per month (Jan 2019 until Apr 2022)!
2. Create a query to get average order value and total number of unique users, grouped by month (Jan 2019 until Apr 2022)!
3. Find the first and last name of users from the youngest and oldest age of each gender (Jan 2019 until Apr 2022)!
4. Get the top 5 most profitable product and its profit detail breakdown by month
5. Create a query to get Month to Date of total revenue in each product categories of past 3 months (current date 15 April 2022), breakdown by date!
ADVANCED ASSIGNMENT:
6. Find monthly growth of TPO (# of completed orders) and TPV (# of revenue) in percentage breakdown by product categories, ordered by time descendingly (Jan 2019 until Apr 2022). After analyzing the monthly growth, is there any interesting insight that we can get?
7. Create monthly retention cohorts (the groups, or cohorts, can be defined based upon the date that a user purchased a product) and then how many of them (%) coming back for the following months in 2019 – 2022. After analyzing the retention cohort, is there any interesting insight that we can get?
Understanding
Table and Variable
Entity Relationship Diagram of the dataset
INTERMEDIATE ASSIGNMENT Answer
Question 1: Table Schema and Result
All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426
SELECT
FORMAT_DATE("%Y-%m", created_at) Month,
COUNT(order_id) total_order,
COUNT(DISTINCT(user_id)) total_user
FROM
`bigquery-public-data.thelook_ecommerce.orders`
WHERE
CAST(created_at AS date) BETWEEN "2019-01-01"
AND "2022-04-30"
AND status="Complete"
GROUP BY
1
ORDER BY
1;
Question 1 Syntax:
Question 2: Table Schema and Result
Question 2 Syntax:�
SELECT
FORMAT_DATE("%Y-%m",created_at) month_date,
ROUND(AVG(sale_price),2) sales_average,
COUNT(DISTINCT(user_id)) total_user
FROM
`bigquery-public-data.thelook_ecommerce.order_items`
WHERE
CAST(created_at AS date) BETWEEN "2019-01-01"
AND "2022-04-30"
GROUP BY
1
ORDER BY
1;
All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426
Question 3: Table Schema and Result
Question 3 Syntax:�
SELECT
first_name,
last_name,
gender,
age,
CASE
WHEN age_first_rank = 1 AND gender="F" THEN "Yes"
WHEN age_first_rank = 1
AND gender="M" THEN "Yes"
ELSE
"No"
END
the_oldest,
CASE
WHEN age_lowest_rank = 1 AND gender="M" THEN "Yes"
WHEN age_lowest_rank = 1
AND gender="F" THEN "Yes"
ELSE
"No"
END
the_youngest
FROM (
SELECT
first_name,
last_name,
gender,
age,
RANK() OVER(PARTITION BY gender ORDER BY age DESC) age_first_rank,
RANK() OVER(PARTITION BY gender ORDER BY age ASC) age_lowest_rank,
FROM
`bigquery-public-data.thelook_ecommerce.users`
WHERE
CAST(created_at AS date) BETWEEN "2019-01-01"
AND "2022-04-30")
WHERE
age_first_rank =1
OR age_lowest_rank=1;
All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426
Question 4: Table Schema and Result
Question 4: Table Schema and Result
Question 4 Syntax:�
WITH
a as
(SELECT
FORMAT_DATE("%Y-%m",i.created_at) month_date,
i.product_id,
p.name,
ROUND(SUM(i.sale_price), 2) sum_sales,
ROUND(SUM(p.cost),2) sum_cost,
ROUND(SUM(i.sale_price)-SUM(p.cost),2) profit
FROM
`bigquery-public-data.thelook_ecommerce.order_items` i
LEFT JOIN
`bigquery-public-data.thelook_ecommerce.products` AS p
ON
i.product_id = p.id
GROUP BY
1,
2,
3),
b as
(SELECT
month_date,
product_id,
name,
sum_sales,
sum_cost,
profit,
RANK() OVER(PARTITION BY month_date ORDER BY
profit DESC) ranking_permonth
FROM a)
SELECT *
FROM b
WHERE
ranking_permonth <=5
ORDER BY
month_date ASC,
profit DESC;
All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426
Question 5: Table Schema and Result
Question 5 Syntax:�
SELECT
date_time,
ROUND(sum_sales,2) revenue_percategory,
ROUND(SUM(sum_sales) OVER(PARTITION BY category, FORMAT_DATE("%Y-%m",date_time)
ORDER BY
date_time),2) revenue_percategory_monthtodate,
category
FROM (
SELECT
EXTRACT(date
FROM
i.created_at) date_time,
SUM(i.sale_price) sum_sales,
p.category
FROM
`bigquery-public-data.thelook_ecommerce.order_items` i
LEFT JOIN
`bigquery-public-data.thelook_ecommerce.products` AS p
ON
i.product_id = p.id
WHERE
CAST(created_at AS date) BETWEEN "2022-02-01"
AND "2022-04-15"
GROUP BY
date_time,
category);
All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426
ADVANCED ASSIGNMENT Answer
Question 6: Table Schema and Result
Question 6 Syntax:�
WITH
a as
(SELECT
date(Date_trunc(o.created_at, month)) Month,
p.category,
COUNT(o.status) complete_order,
SUM(i.sale_price) revenue
FROM `bigquery-public-data.thelook_ecommerce.orders` o
INNER JOIN `bigquery-public-data.thelook_ecommerce.order_items` i
ON o.order_id = i.order_id
INNER JOIN `bigquery-public-data.thelook_ecommerce.products` p
ON i.product_id =p.id
WHERE o.status = "Complete"
GROUP BY 1,2),
b as
(SELECT
Month,
category,
complete_order,
LAG(complete_order,1) OVER(partition by category ORDER BY Month) previous_complete_order,
revenue,
LAG(revenue,1) OVER(partition by category ORDER BY Month) previous_revenue
FROM a)
SELECT
Month,
category,
round((complete_order-previous_complete_order)/previous_complete_order,2) complete_order_growth,
round((revenue-previous_revenue)/previous_revenue,2) revenue_growth
FROM b
WHERE month BETWEEN "2019-01-01" AND "2022-04-30"
ORDER BY 1 ASC, 2;
All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426
Insight from Question 6 Table
Question 7: Table Schema and Result
Question 7 Syntax:�
WITH
a AS (
SELECT
user_id,
MIN(DATE(DATE_TRUNC(created_at, month))) OVER(PARTITION BY user_id) cohort_month,
DATE(DATE_TRUNC(created_at, month)) order_time
FROM
`bigquery-public-data.thelook_ecommerce.orders`),
b AS (
SELECT
*,
DATE_DIFF(order_time, cohort_month, month) month
FROM
a),
c AS (
SELECT
cohort_month,
month,
COUNT(user_id) count_order
FROM
b
WHERE
month<=12
GROUP BY
1,
2),
d AS (
SELECT
cohort_month,
count_order count_order_inmonth
FROM
c
WHERE
month = 0)
SELECT
c.cohort_month,
c.month,
d.count_order_inmonth,
c.count_order,
round(c.count_order/d.count_order_inmonth,3) percentage
FROM
c
LEFT JOIN
d
ON
c.cohort_month = d.cohort_month
WHERE
c.cohort_month BETWEEN "2021-06-01"
AND "2022-06-01"
ORDER BY
1,
2;
All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426
Cohort Analysis for Question 7
Insight: