Capstone Project
By
Ansh Bhatnagar
Sandeep Kumar Maurya
Data Science Trainee, AlmaBetter
WHY ANALYZE THE GOOGLE PLAY STORE?
What makes an App popular? Can we predict how popular it’s going to be?
Mobile App Market is set to grow 20% by 2023
Android Apps comprise 90% of the Mobile App Market
What are some interesting patterns in user behavior related to app usage & feedback?
Introduction
Problem Statement
So, what factors influence an app's success?
An app is said to be successful if it has:
Problem Statement
Agenda
Dataset Preparation
dataset and User Reviews dataset.
1.App : This column Contains the name of the app for each observation.�2.Category : This column Contains Category to which the app belongs.�3.Rating : This column contains the average rating for the app. �4.Reviews : This column contains the number of reviews that the app has received on the play store.�5.Size : This column contains the amount of memory the app occupies on the device.�6.Installs : This column contains the number of times that the app has been�downloaded and installed from the play store.�7.Type : This column contains the information whether the app is free or paid.�8.Price: If the app is a paid app, this column contains the data about its price.�9.Content Rating: This column contains the maturity rating of the app i.e. the�age group of the audience for which it is suitable.�10.Genres: This column contains the data about to which genre the app belongs. Genres can �be considered as a further division of the group of Category. �11.Last Updated: Contains the date on which the latest update of the app was released.�12.Current Version: Contains information on the current version of the app available on the play store.�13.Android Version: Contains information about the android versions on which the app is supported.�
Attributes in Google Play store Data
Attributes in User reviews
OVERVIEW OF ANALYSIS
Predictive Modeling
Formulate a statistical model to forecast an outcome using relevant predictors
Data Cleaning
Understand the structure of the dataset and clean data before analysis
Data Exploration
Uncover initial patterns, characteristics, and points of interest using visual exploration
Pairwise Plot- Ratings, Size, Installs, Reviews, Price
Correlation Heatmap
❏ There is a strong positive correlation between the Reviews and Installs.
❏ The Price is slightly
negatively correlated with
and
the Rating, Reviews, Installs.
❏ The Rating is slightly positively correlated with the Installs and Reviews.
Percentage of Paid apps v/s Free apps
We Observed that 92.20% of Apps are free and only 7.80% of Apps are paid in Play store.
Content Rating
From the above plot we can see that Everyone category having majority of apps count.
A majority of the apps (81.80%) in the play store are can be used by everyone. The remaining apps have various age restrictions to use it.
Count of Applications in each category
Family and Game apps have the
highest market prevalence. Surprising Tools, Business and Medical apps are also at the Top Count of applications.
Category App's have most number of installs
The Game, Communication and Tools categories has the highest number of installs compared to other categories of apps.
Average rating of the apps
Top 10 installed apps in any category
This graph shows the top installed apps in the ‘Games’ category. Further looking into the play store reveals that these apps are light, casual, single player games.
Top Free Apps
Top Paid Apps Based on Revenue Generated
Revenue = Installs * Price
App Size Analysis
~0 to 100 MB in the intervals of 10 MB each.
Android version based on each category
Percentage of Review Sentiments
The number of Unique Apps from Play store and User reviews merged dataset are 816.
From Sentiment column, 64% are Positive, 22% are Negative and 14% are Neutral values.
Positive and Negative Reviews
Helix Jump is a App from merged dataset has highest 209 Positive sentiment count.
Angry Bird Classic is a app from merged dataset has highest 147 Negative sentiment count.
Is sentiment_subjectivity proportional to sentiment_polarity?
From the above scatter plot it can be concluded that sentiment subjectivity is not always proportional to sentiment polarity but in maximum number of case, shows a proportional behavior, when variance is too high or low
Co-Relation in merged data frame
In this correlation matrix, There is not a significant relationship between Rating, Reviews, Size and Installs with respect to the Sentiment polarity and Sentiment subjectivity.
Distribution of Apps updated over the Year and Month
Challenges Faced
Challenges Faced
Conclusion’s
92.19% apps are Free and 7.81% apps are paid in type.
81.80% apps have Everyone content rating.
Events category has a highest mean rating of 4.39 and Dating category has lowest 4.05 rating.
Family, Game and Tools are top three categories having 1906, 926 and 829 app count.
Most competitive category: Family
Category with the highest number of installs: Game
Tools, Entertainment, Education, Business and Medical are top Genres.
8783 Apps are having size less than or equal to 50 MB.
7749 Apps has rating more than 4.0 including both type of app.
Overall sentiment count of merged dataset in which Positive sentiment count is 64%, Negative 22% and Neutral 14%.
Conclusion’s
It's good to develop a Free type app and having a content rating for Everyone.
Percentage of apps that are top rated = 81.80%
There are 20 free apps that have been installed over a billion times
Minecraft is the only app in the paid category with over 10M installs, and also has produced the most revenue only from installation fee.
Price, Rating, Size has no or very less correlation with Sentiment Polarity.
The median size of the apps in the play store is 12 MB
The apps whose size varies with device has the highest number average app installs.
The apps whose size is greater than 90 MB has the highest number of average user reviews, ie, they are more popular than the rest.
Helix Jump has the highest number of positive reviews and Angry Birds Classic has the highest number of negative reviews.