1 of 13

Predictive Analytics for Business

Project: Predicting Diamond Prices

Ishita Malhotra

Mentor, Project Reviewer & Knowledge Moderator

1 year 10 months with Udacity

2 of 13

2

Agenda

  • Project Summary�
  • Pain point #1: Regression equation�
  • Pain point #2 : Price calculations

  • Pain point #3 : Scatterplot visualization

  • Q&A

3 of 13

PROJECT SUMMARY

4 of 13

4

Project Summary

A jewelry company wants to put in a bid to purchase a large set of diamonds, but is unsure how much it should bid.

Skills that will be learned

  • Linear regression analysis
  • Recommendation and decision making for business

Tips for starting the project

  • Using formulae and creating charts in Excel
  • Do not plagiarise the project(s)
  • Use Knowledge platform to get your doubts clarified

5 of 13

Regression Equation

6 of 13

6

Regression Equation

For the purpose of this project, regression equation has already been provided-

Note:

Do not come up with a new regression equation.

7 of 13

7

Regression Equation continued

The prices for the existing diamonds are already provided, which will be used as "historical data" to predict prices for new diamonds.

Therefore, apply the regression equation on the “new_diamonds” dataset to obtain “predicted prices”.

Note:

Bidding price is not the same as predicted price. It’ll be calculated based on the “new_diamonds” aggregated predicted price.

8 of 13

Price Calculations

9 of 13

9

Price Calculations

Applying regression equation on the existing “diamonds” dataset may not yield the same price numbers as provided in the dataset.

10 of 13

10

Price Calculations continued

The regression formula is a best fit line through all of the diamonds in diamonds.csv. It is not the formula to determine the exact price of any one diamond. On the whole it captures the relationship between these variables and price. There may be other factors affecting the exact price of the existing “diamonds”.

11 of 13

Scatterplot Visualization

12 of 13

12

Scatterplot Visualization

Tips for scatterplot:

  • Scatterplot accepts only the numeric fields.
  • Conventionally, the target variable is plotted on y-axis and the independent (or other*) variable on x-axis.
  • Include the chart title
  • Provide axis labels
  • Indicate the legends (i.e. the key for each color)

*Predictor variable: Used to predict the target variable

13 of 13

Q&A