1 of 10

Analyzing NYC Income Data by Type

Andrew Finateri, Michael Conrad, Stefan Major

2 of 10

Project Goals and Conditions

The goal of this project is to analyze the overall income data of New York City
A condition that could affect the datasets is the number of tax filers/payers and income from various types of income
Another condition is the specific cash transactions that aren’t necessarily filed.
Decided to analyze this data because no matter what is occurring in the world, the economy as a whole is a consistently talked about topic
We expect the data to mostly have trends with slight alters to the trends because of outlying data

Business Understanding

Since the pandemic the government and businesses have been trying to fix the wages and salaries
Dividends and interest directly correlate to businesses being invested in, banks loaning money, or people earning money from the money they put in banks
Business income being the money the people make from businesses

Data Preparation

Original uncleaned data

Data Preparation

Steps required to clean data and make it usable.

Data Preparation

Steps required to plot data as a scatterplot and pie chart.

Data Analysis

Azure Construction

Using previous assignments as references we created a working ML model
Split Data into 50/50 (training and testing)
Use the training half to train the module using a multiclass decision forest
Score model to implement the trained algorithm and test the other half of the data
Evaluate the model to show visualizations of the results

Azure Analysis

Overall model had an accuracy of zero
Due to only 10 rows of data which were averages
If more data was present we could have better trained the model and expected to yield more accurate classification predictions from the system

Conclusion

From our data and analysis we can conclude that there are still errors in our economic system.
If we were to do the project again we would use a bigger dataset and allow for more scattered data values that we could use to better train our model and therefore create a more accurate model