1 of 12

Google Play Store

Dataset Analysis

Group 3:

Joe Zhang, Farris Al-Quqa, Ben Claton, Sydney Davis, Xinyi Zhang

2 of 12

Data Introduction

  • Collected on February 3rd, 2019
  • Lavanya Gupta, a software engineer that focuses her research in data science, machine learning, and deep learning
  • Reason for data collection
    • Apple App Store data vs Google Play Store data

2

3 of 12

Dataset and Variables

4 of 12

4

5 of 12

Which characteristic of Google Play apps is most essential to generating high revenue?

Our Question:

5

6 of 12

Variables and Modifications

  • Variables:
    • Variables of interest: category, price, content rating, installs
    • Revenue: the total amount of USD spent on an app
      • found by multiplying the price of the app with the number of installations the app received
  • Modifications:
    • Amount of installs were modified
      • The minimum levels (5, 50, 500, etc.) were used instead of specific numbers
      • Instead of “5+, 50+, 500+” we changed it to “5, 50, 500”
      • Method of improvement: get supplementary dataset that provides a more specific, numerical install variable

6

7 of 12

7

8 of 12

9 of 12

9

10 of 12

10

11 of 12

11

12 of 12

Closing

  • Conclusion
    • High volume correlates with higher revenue when price in essence is held constant
      • Means were largely the same
  • Further Considerations
    • Other revenue streams outside of cost of installation
      • E.g. Lifestyle/Fitness related apps
      • Unfortunately, data was not provided for this type of analysis

12