Hotel Booking Cancellations
By: Huong Lenoch
Content
Hotel Booking System
Problem Statement
Hypothesis
Data Set
Tools and Methods
Data Visualization
Modeling and Evaluation
Conclusion
Problem Statement��Using available hotel booking data set to predict cancellations to help produce better forecasts and reduce uncertainty in management decisions.��
Hypothesis
A guest with longer lead-time* and has previous cancellations will more likely cancel his/her booking.
Hotel Booking System Diagram
Note: * Lead-time is number of days prior to arrival that the booking was placed in the hotel
Data
Machine Learning was used to build a predictive model to predict booking cancellations. The target, “is canceled”, is binary (0: no; 1: yes), so two-class classification algorithms are chosen including Tree, Neural Network and Logistic Regression.
Orange: Friendly user interface when it comes to charts/graphs, no coding requires.
Excel: Some “ninja tricks” will be used to make the message clearer.
Methods
And
Tools
Data Visualizations and Analytics
Figure 1: The Resort Hotel has more bookings but less cancellations.
Figure 2: The hotels have more cancellations during the high season from July to October.
Figure 3: On average, the bookings that were canceled had higher average daily rate of $102.62, whereas the ones that were not canceled had an average rate of $90.05.
Effect of deposit type on cancellations
Figure 4: Over 99% of customers who had non-refund deposit type canceled their bookings
Figure 5: Effect of booking history on cancellations
Figure 6: Bookings were made a few days before the arrival date are rarely canceled; however, bookings were made over 200 days before the arrival date are canceled very often.
Figure 7: Effect of lead time on cancellations�
The trend line shows a positive correlation between lead time and cancellation probability. The longer the lead time is, the higher chance the booking get canceled.
Figure 5
Figure 6
Figure 7
Hypothesis
A guest with a longer lead time and has previous cancellations has a higher probability of cancelling their bookings
|
|
All models reached accuracy values above 80%
-> Booking cancellations can be predicted by these models.
| Tree | Logistic Regression | Neural Network |
Accuracy | 0.8338 | 0.8099 | 0.8376 |
Precision | 0.7931 | 0.8340 | 0.8110 |
Recall | 0.7458 | 0.6078 | 0.7321 |
Conclusions
City Hotel
From July to October
High daily rate
Non-refund deposit type
Long lead-time
Previous cancellations
Lessons learned
References