Airbnb Listing & Review Analysis
Group L – Berlin – Team 4
FSDA Batch of May 2022
Table of Contents
02
03
04
01
Meet our team
Data Dictionary Company Overview
Host & Superhost
Dataset Overview
Team
Overview
Methodology
Data Analysis
Research Methodology
Problem background & support question
Data Analysis
Data Visualization
Insight & Recommendation
Meet Our Team
ARTHUR J. ANDREAS
Project Lead
GALIH SATRIANI
Data Cleaning &
Analysis Team
AJI NOOR
SHANELLA N. H
Data Visualization & Presentation Team
Data Cleaning &
Analysis Team
Overview
02
Data Dictionary
listing_id = Listing ID
name = Listing Name
host_id = Host ID
host_since = Date the Host joined Airbnb
host_location = Location where the Host is based
host_response_time = Estimate of how long the Host takes to respond
host_response_rate = Percentage of times the Host responds
host_acceptance_rate = Percentage of times the Host accepts a booking request
host_is_superhost = Binary field to determine if the Host is a Superhost
host_total_listings_count = Total listings the Host has in Airbnb
host_has_profile_pic = Binary field to determine if the Host has a profile picture
host_identity_verified = Binary field to determine if the Host has a verified identity
neighborhood = Neighborhood the Listing is in
district = District the Listing is in
city = City the Listing is in
latitude = Listing's latitude
longitude = Listing's longitude
property_type = Type of property for the Listing
room_type = Type of room type in Airbnb for the Listing
accommodates = Guests the Listing accommodates
bedrooms = Bedrooms in the Listing
amenities = Amenities the Listing includes
price = Listing price (in each country's currency)
LISTING DATASET
minimum_nights = minimum nights per booking
maximum_nights = maximum nights per booking
review_scores_rating = Listing's overall rating (out of 100)
review_scores_accuracy = Listing's accuracy score based on what's promoted in Airbnb (out of 10)
review_scores_cleanliness = Listing's cleanliness score (out of 10)
review_scores_checkin = Listing's check-in experience score (out of 10)
review_scores_communication = Listing's comunication score within the city (out of 10)
review_scores_location = Listing's location score within the city (out of 10)
review_scores_value = Listing's value score relative to its price (out of 10)
instant_bookable = Binary field to determine if the Listing can be booked instantly
REVIEW DATASET
listing_id = Listing Id that create every time when user make order
review_id = Review ID for primary keys
review_date = Date reviewer giving review
reviewer_id = ID user that give review
Company Overview
Airbnb, Inc. is an American company that operates an online marketplace for lodging, primarily homestays for vacation rentals, and tourism activities. The company was founded in 2008, Based in San Francisco, California, the platform is accessible via the website and mobile app. Airbnb does not own any of the listed properties; instead, it profits by receiving commission from each booking.
1.4M account
81K Active Host
4 room type�144+ property type
Host & Superhost
Airbnb has 3 types of hosts :
1.
>>>>
2.
3.
>>>>
Methodology
03
Research Methodology
Data Gathering
2
3
Data Analysis
4
Data
Visualization
5
Data Cleaning
Set the problems
1
Insight & Recommendation
6
What do we do?
Airbnb Listings & Reviews
Airbnb data for 250,000+ listings across 10 major cities, with 5 million reviews.
From Kaggle (Click here)
DATA GATHERING
Cleaning the data before analysis
Combine the data, drop the data, change datatype, remove null, remove irrelevant values, remove error data using Google Collaboration (Phyton).
DATA CLEANING
Data after cleaning
Doing some analysis using Google Collaboration (Phyton) & Tableau, to find insight and recommendation.
DATA ANALYSIS
Insight & Recommendation
Make insight & recommendation after analysis. Also provide data visualization in Dashboard use Tableau.
DATA VISUALIZATION
Problem Background
Decreasing Number of Booking
Down 57%
Support Questions
Is there any correlation between number of booking and SuperHost status?
Is there any correlation between number of booking and identity verified host?
Is there any correlation between number of booking and Instant bookable status?
Is there any correlation between number of booking and Room Type?
Dataset
Problems
Data�Analysis
04
Why Number of Booking is decreasing 58%
in 2014 - 2018?
EDA (Exploratory Data Analysis)
1. Growth Host
Note : Jgn lupa ini ntar diganti pake chart ppt, biar bagus
Down 51%
2. New Host Register
EDA (Exploratory Data Analysis)
Note : Jgn lupa ini ntar diganti pake chart ppt, biar bagus
3. Occupation Rate
Low Occupation Rate (<70%)
EDA (Exploratory Data Analysis)
How to increase the number of bookings by 20% in the next 1 year?
4. Number of Customer
Down 58%
Correlation Analysis
Define variables that correlate with Number of booking
Multivariate Linear Regression Analysis
| 2018 | Predict | Increases |
Number of SuperHost | 19.171 | 20.129 | 5% |
Number of Verified Host | 15.301 | 17.749 | 16% |
Number of Booking | 342.435 | 413.097 | 20% |
Predict the Number of booking from Number of SuperHost & Number of Verified Host
Seasonality Trend of Booking
invest more in high booking season based on city
Trend Line of Booking
Customer Characteristics
RFM Analysis
There’s 3 Cluster of Customer Segmentation :
Dividing customers and prospects into different groups
Customer Segmentation
Cluster Backpacker
There’s 52% of the total customer. Their score for Recency is low and reflects that they are inactive customers. The Frequency score is high scale value. Their Monetary Value score is low regardless of how often they buy from you, reflecting that they are attracted by low-price hosts.
Cluster Exclusive Tripper
There’s 17% of the total customer. Exclusive Tripper customers place high-value orders and they’ve done so recently, but they lack in frequency. If nurtured properly, this segment could turn into the most valuable segment for airbnb.
If neglected, customers in the Exclusive Tripper group are at risk of becoming the customers that make one-time high purchases and never come back.
Cluster Flashpacker
There’s 31% of the total customer. Flashpackers do just fine in terms of Recency, Monetary, and Frequency. The total RFM score can be improved if you increase the Average Order Value (AOV) and create a healthy buying habit in this segment.
Customer Segmentation
Cluster Backpacker
Target Market = Cluster Backpacker & Flashpacker
Why? Medium-High Frequency that Backpacker and Flashpacker have indicate a good repeat order and produce income stream for AirBnB and their Host. We need them to share their experience and attract new users to follow their experience using AirBnB.
Cluster Exclusive Tripper
Cluster Flashpacker
DASHBOARD SNAPSHOT
Insight & Recommendation
About Business Questions :
From our multilinear regression analysis, we get a conclusion that a 5% increase in the number of super hosts & 16% increase in the number of verified hosts will increase 20% of the number of bookings (Ideal Condition) but when we predict realistic condition there’s only increase 13,12%
About Customer Segmentation to improve our booking numbers :
Thank You !
Group L – Berlin – Team 4
FSDA Batch of May 2022
Appendix & Data Source
Business Requirement Document :
https://docs.google.com/document/d/194Ts1SsLzsDNwUd7v0RGMv4PXllHPg6U/edit
Dataset Gathering : https://www.kaggle.com/datasets/mysarahmadbhat/airbnb-listings-reviews
Dataset Cleaning & Analysis : Upload Kaggle.Json sebelum melakukan run google colab
For Correlation : https://colab.research.google.com/drive/1cm9Yw5STufViVwCsOsLOQXZfx9yRopHb?usp=sharing
For Customer Clustering :
https://colab.research.google.com/drive/1nHLwMTD4mTWd4SGreSK9MZiKraPVwQux?usp=sharing
Data Visualization : https://public.tableau.com/app/profile/shanella6379/viz/GFPDashboardAirbnb-TeamL/Dashboard1?publish=yes
Others :
Marketing insight : https://www.referralcandy.com/blog/airbnb-referral-program