1 of 10

Analyzing NYC Income Data by Type

Andrew Finateri, Michael Conrad, Stefan Major

2 of 10

Project Goals and Conditions

  • The goal of this project is to analyze the overall income data of New York City
  • A condition that could affect the datasets is the number of tax filers/payers and income from various types of income
  • Another condition is the specific cash transactions that aren’t necessarily filed.
  • Decided to analyze this data because no matter what is occurring in the world, the economy as a whole is a consistently talked about topic
  • We expect the data to mostly have trends with slight alters to the trends because of outlying data

3 of 10

Business Understanding

  • Since the pandemic the government and businesses have been trying to fix the wages and salaries
  • Dividends and interest directly correlate to businesses being invested in, banks loaning money, or people earning money from the money they put in banks
  • Business income being the money the people make from businesses

4 of 10

Data Preparation

Original uncleaned data

5 of 10

Data Preparation

Steps required to clean data and make it usable.

6 of 10

Data Preparation

Steps required to plot data as a scatterplot and pie chart.

7 of 10

Data Analysis

8 of 10

Azure Construction

  • Using previous assignments as references we created a working ML model
  • Split Data into 50/50 (training and testing)
  • Use the training half to train the module using a multiclass decision forest
  • Score model to implement the trained algorithm and test the other half of the data
  • Evaluate the model to show visualizations of the results

9 of 10

Azure Analysis

  • Overall model had an accuracy of zero
  • Due to only 10 rows of data which were averages
  • If more data was present we could have better trained the model and expected to yield more accurate classification predictions from the system

10 of 10

Conclusion

  • From our data and analysis we can conclude that there are still errors in our economic system.
  • If we were to do the project again we would use a bigger dataset and allow for more scattered data values that we could use to better train our model and therefore create a more accurate model