1 of 23

Retrieving and Analyzing data with SQL:�The Look E-Commerce Dataset

By: Muhammad Farhan Abdillah

REVOU ASSIGNMENT

2 of 23

Introduction

With The Look Ecommerce Dataset in Google BigQuery, I answer the questions below with SQL:

INTERMEDIATE ASSIGNMENT:

1. Create a query to get the total users who completed the order and total orders per month (Jan 2019 until Apr 2022)!

2. Create a query to get average order value and total number of unique users, grouped by month (Jan 2019 until Apr 2022)!

3. Find the first and last name of users from the youngest and oldest age of each gender (Jan 2019 until Apr 2022)!

4. Get the top 5 most profitable product and its profit detail breakdown by month

5. Create a query to get Month to Date of total revenue in each product categories of past 3 months (current date 15 April 2022), breakdown by date!

ADVANCED ASSIGNMENT:

6. Find monthly growth of TPO (# of completed orders) and TPV (# of revenue) in percentage breakdown by product categories, ordered by time descendingly (Jan 2019 until Apr 2022). After analyzing the monthly growth, is there any interesting insight that we can get?

7. Create monthly retention cohorts (the groups, or cohorts, can be defined based upon the date that a user purchased a product) and then how many of them (%) coming back for the following months in 2019 – 2022. After analyzing the retention cohort, is there any interesting insight that we can get?

3 of 23

Understanding

Table and Variable

Entity Relationship Diagram of the dataset

4 of 23

INTERMEDIATE ASSIGNMENT Answer

5 of 23

Question 1: Table Schema and Result

6 of 23

All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426

SELECT

  FORMAT_DATE("%Y-%m", created_at) Month,

  COUNT(order_id) total_order,

  COUNT(DISTINCT(user_id)) total_user

FROM

  `bigquery-public-data.thelook_ecommerce.orders`

WHERE

  CAST(created_at AS date) BETWEEN "2019-01-01"

  AND "2022-04-30"

  AND status="Complete"

GROUP BY

  1

ORDER BY

  1

Question 1 Syntax:

7 of 23

Question 2: Table Schema and Result

8 of 23

Question 2 Syntax:�

SELECT

  FORMAT_DATE("%Y-%m",created_at) month_date,

  ROUND(AVG(sale_price),2) sales_average,

  COUNT(DISTINCT(user_id)) total_user

FROM

  `bigquery-public-data.thelook_ecommerce.order_items`

WHERE

  CAST(created_at AS date) BETWEEN "2019-01-01"

  AND "2022-04-30"

GROUP BY

  1

ORDER BY

  1

All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426

9 of 23

Question 3: Table Schema and Result

10 of 23

Question 3 Syntax:�

SELECT

  first_name,

  last_name,

  gender,

  age,

  CASE

    WHEN age_first_rank = 1 AND gender="F" THEN "Yes"

    WHEN age_first_rank = 1

  AND gender="M" THEN "Yes"

  ELSE

  "No"

END

  the_oldest,

  CASE

    WHEN age_lowest_rank = 1 AND gender="M" THEN "Yes"

    WHEN age_lowest_rank = 1

  AND gender="F" THEN "Yes"

  ELSE

  "No"

END

  the_youngest

FROM (

  SELECT

    first_name,

    last_name,

    gender,

    age,

    RANK() OVER(PARTITION BY gender ORDER BY age DESC) age_first_rank,

    RANK() OVER(PARTITION BY gender ORDER BY age ASC) age_lowest_rank,

  FROM

    `bigquery-public-data.thelook_ecommerce.users`

  WHERE

    CAST(created_at AS date) BETWEEN "2019-01-01"

    AND "2022-04-30")

WHERE

  age_first_rank =1

  OR age_lowest_rank=1

All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426

11 of 23

Question 4: Table Schema and Result

12 of 23

Question 4: Table Schema and Result

13 of 23

Question 4 Syntax:�

WITH

as

(SELECT

      FORMAT_DATE("%Y-%m",i.created_at) month_date,

      i.product_id,

      p.name,

      ROUND(SUM(i.sale_price)2) sum_sales,

      ROUND(SUM(p.cost),2) sum_cost,

      ROUND(SUM(i.sale_price)-SUM(p.cost),2) profit

    FROM

      `bigquery-public-data.thelook_ecommerce.order_items` i

    LEFT JOIN

      `bigquery-public-data.thelook_ecommerce.products` AS p

    ON

      i.product_id = p.id

    GROUP BY

      1,

      2,

      3),

as

(SELECT

    month_date,

    product_id,

    name,

    sum_sales,

    sum_cost,

    profit,

    RANK() OVER(PARTITION BY month_date ORDER BY 

profit DESC) ranking_permonth

FROM a)

SELECT *

FROM b

WHERE

  ranking_permonth <=5 

ORDER BY

  month_date ASC,

  profit DESC;

All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426

14 of 23

Question 5: Table Schema and Result

15 of 23

Question 5 Syntax:�

 SELECT

  date_time,

  ROUND(sum_sales,2) revenue_percategory,

  ROUND(SUM(sum_sales) OVER(PARTITION BY category, FORMAT_DATE("%Y-%m",date_time)

    ORDER BY

      date_time),2) revenue_percategory_monthtodate,

  category

FROM (

  SELECT

    EXTRACT(date

    FROM

      i.created_at) date_time,

    SUM(i.sale_price) sum_sales,

    p.category

  FROM

    `bigquery-public-data.thelook_ecommerce.order_items` i

  LEFT JOIN

    `bigquery-public-data.thelook_ecommerce.products` AS p

  ON

    i.product_id = p.id

  WHERE

    CAST(created_at AS date) BETWEEN "2022-02-01"

    AND "2022-04-15"

  GROUP BY

    date_time,

    category);

All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426

16 of 23

ADVANCED ASSIGNMENT Answer

17 of 23

Question 6: Table Schema and Result

18 of 23

Question 6 Syntax:�

 

WITH

as

(SELECT

  date(Date_trunc(o.created_at, month)) Month,

  p.category,

  COUNT(o.status) complete_order,

  SUM(i.sale_price) revenue

FROM `bigquery-public-data.thelook_ecommerce.orders` o

INNER JOIN `bigquery-public-data.thelook_ecommerce.order_items` i

ON o.order_id = i.order_id

INNER JOIN `bigquery-public-data.thelook_ecommerce.products` p

ON i.product_id =p.id

WHERE o.status = "Complete" 

GROUP BY 1,2),

as

(SELECT

  Month,

  category,

  complete_order,

  LAG(complete_order,1) OVER(partition by category ORDER BY Month) previous_complete_order,

  revenue,

  LAG(revenue,1) OVER(partition by category ORDER BY Month) previous_revenue

FROM a)

SELECT

  Month,

  category, 

  round((complete_order-previous_complete_order)/previous_complete_order,2) complete_order_growth,

  round((revenue-previous_revenue)/previous_revenue,2) revenue_growth

FROM b

WHERE month BETWEEN "2019-01-01" AND "2022-04-30"

ORDER BY 1 ASC2;

All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426

19 of 23

Insight from Question 6 Table

  • On average, the 3 highest categories of revenue growth (per month) is Jumpsuits & Rompers(37,8%), Blazers & Jackets (19,4%), dan Accessories (17,1%).
  • On average, the 3 highest categories of order growth (per month) is Leggings (16,7%), Dresses (15,0%), dan Socks (14,9%).
  • Although order growth in Skirt category is increasing 11,1%, the average of the revenue growth is -5,7%.
  • Clothing Sets Category is having bad order growth (0%), and the revenue growth is also bad (-3,5%)

20 of 23

Question 7: Table Schema and Result

21 of 23

Question 7 Syntax:�

WITH

  a AS (

  SELECT

    user_id,

    MIN(DATE(DATE_TRUNC(created_at, month))) OVER(PARTITION BY user_id) cohort_month,

    DATE(DATE_TRUNC(created_at, month)) order_time

  FROM

    `bigquery-public-data.thelook_ecommerce.orders`),

  b AS (

  SELECT

    *,

    DATE_DIFF(order_time, cohort_month, month) month

  FROM

    a),

  c AS (

  SELECT

    cohort_month,

    month,

    COUNT(user_id) count_order

  FROM

    b

  WHERE

    month<=12

  GROUP BY

    1,

    2),

 

 d AS (

  SELECT

    cohort_month,

    count_order count_order_inmonth

  FROM

    c

  WHERE

    month = 0)

SELECT

  c.cohort_month,

  c.month,

  d.count_order_inmonth,

  c.count_order,

  round(c.count_order/d.count_order_inmonth,3) percentage 

FROM

  c

LEFT JOIN

  d

ON

  c.cohort_month = d.cohort_month

WHERE

    c.cohort_month BETWEEN "2021-06-01"

    AND "2022-06-01"

ORDER BY

  1,

  2;

All of the queries can be seen at:�https://console.cloud.google.com/bigquery?sq=381261692230:508326d30bd94c979ef4d3dcf03be426

22 of 23

Cohort Analysis for Question 7

Insight:

  • We can see that the total order is increased from 2442 in June 2021 to 7951 in June 2022
  • Stickiness of the customer tend to increase over the month.
  • In May 2022, the retention rate decrease tp 14,8% from the previous month (16,7%) caused by incomplete dataset in June 2022

23 of 23

THANK YOU�VERY MUCH!

Please keep this slide for attribution

For more information, reach me in:

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon, and infographics & images by Freepik