1 of 12

Unlocking Excel for�Data Analysis and Statistics

The 20% of core skills that empower you to address 80% of tasks

2 of 12

�File Types

*.csv

  • Comma Separated Values file
  • Can use other delimiters
    • tab
    • space
    • etc.
  • Contain a single sheet (“flat files”)
  • No formulas or special formatting

*.xlsx

  • Excel worksheet files
  • Can contain multiple sheets (tabs)
  • Can include formulas and calculated values
  • Can include formatting
  • Can include graphs

3 of 12

�In-Class Data Set: Loan Approval

Navigate here and download the csv file containing the dataset ( )

    • Open the csv file in Excel (you may need to “unzip” it first)
    • Save the file using the xlsx format to a MAT240 folder on your computer (we’ll use this file often)
    • Double-Click on the tab at the bottom of the spreadsheet and rename it to RawData

A Note on optional, but good practice: Right-Click (or ctrl+Click for mac) on the tab you just renamed and select Protect Sheet select OK with the default settings

      • This step prevents us from being able to edit the raw data (accidentally or otherwise)
      • If you’ve done it correctly, you should see a lock icon next to the sheet name and if you attempt to edit a cell, you’ll be warned that you cannot do so without unprotecting the sheet

We won’t do this in MAT240, but it is something you should consider doing generally

4 of 12

�Explore the Data Set

Answer the following questions – discuss with people around you

    • How many columns are in the data set?
    • What does each column contain? Is the meaning clear?
    • How many rows are in the data set?
    • What does each row represent? Is the meaning clear?

5 of 12

�Working with your Data

Take the following steps to allow manipulation of your data

    • Use the plus button (+) in the tab navigation bar to create a new sheet
    • Call this sheet WrangledData
    • While not always necessary, we’ll create a copy of our raw data here
      • Don’t use copy/paste – we “hard code” values only when we have no other option
      • We’ll instead reference the cells in the RawData sheet
      • In cell A1 of your new sheet, type: = RawData!A1
        • The format here is SheetName followed by an exclamation point (!) followed by the location of the cell being copied
      • Now you’ll need to populate all of the cells
        • We can do this by grabbing the bottom-right corner of cell A1 on our new sheet and dragging to enclose an area matching the dimensions of the data set on the RawData tab, or…
        • Adjusting what we’ve typed into cell A1 to be: = RawData!A1:RawData!L45001

6 of 12

�Let’s Create Some New Features

Complete each of the following tasks, discussing with people around you…

    • Create a new column PersonIncome_k, which will contain the applicant’s income in 1000’s of dollars. Populate that column with the appropriate values.
    • Validate the loan_percent_income column by using two other existing columns in the data set. Discuss any drawbacks you see to the existing loan_percent_income column.
    • As of September 2024, the average personal loan amount borrowed was $11,687 (Investopedia). Create another new column AmtAboveAvg which contains the dollar value above the average borrowed amount for each requested loan. Negative numbers will indicate a below average requested amount.
      • For increased transparency, add the $11,687 as an Assumed Value on a new sheet/tab called AssumedValues. Label that cell so that you know what it contains and then carry out the calculation by referencing this cell. (Hint. You’ll want to anchor your reference by using $ signs)

7 of 12

�Built-In Functionality

In addition to arithmetic operations like those you used in completing the previous tasks, Excel has many built-in functions we can use to transform or summarize data. Try each of the following:

    • Use the IF() function to create a new column, AboveAvg, which contains yes if the loan amount is above the average initial amount borrowed from September 2024.
      • You can either use your new AmtAboveAvg column or calculate the result from the loan_amount and the $11,687 from the AssumedValues sheet.
    • Use the AVERAGE() function to determine the average initial loan_amount for this data set.
      • You can do this on a new sheet called SummaryStatistics – make sure to label your value so that you know what it corresponds to.

8 of 12

�Filtering

You can quickly filter your spreadsheet to see only certain rows

    • Click on your RawData tab
    • Choose the Data menu from the ribbon
    • Click the Filter button (the icon looks like a funnel)
    • This adds drop-down menus to the first cell in each populated column
    • Click the drop-down menu in the loan_intent column and filter to see only MEDICAL loans
    • Now use the loan_grade filter to show only loans graded “A” or “B”
    • Click both drop-downs and revert back to showing all the rows

9 of 12

�Grouped Summaries with Pivot Tables

Unfortunately filtering your data won’t change your calculations – we can use pivot tables to calculate quantities by group instead

    • Go to your RawData tab
    • Click on the Insert menu from the ribbon and choose Pivot Table
    • In the Source field for the data you want to analyse use RawData!A1:N45001
    • Choose to place the pivot table in a new worksheet (new tab)
    • Drag the loan_intent column to the rows field and loan_amnt column to the values field
    • Click on the dropdown in the values field, choose Value Field Settings, and select AVERAGE to calculate group averages instead of group totals

10 of 12

�Adding More Summaries

You can include additional summaries over your groups

    • Drag the loan_intent column into the values field (it will compute counts by default)
    • Drag the loan_amnt column into the values field again and use the dropdown again to choose the standard deviation summary value
    • Include additional summaries if you like
    • Rearrange the order of the items in the values field to reorder the columns in your pivot table
    • You can include more complex groupings by dragging additional column names into the rows field

11 of 12

�Coming Soon…

That’s enough of an overview for now, but in the coming class meetings we’ll learn to…

    • construct and interpret appropriate visualizations
    • conduct more detailed exploratory analyses
    • compute probabilities
    • conduct statistical tests

12 of 12

�Next Time…

  • What we’ll be doing…
    • Descriptive Statistics: Organizing, summarizing, visualizing, and interpreting sample data
  • How to prepare…
    • Read sections 1.4 and 2.1 – 2.6 in the textbook

Homework: Complete Homework 1 (Terminology and Sampling Methods) on MyOpenMath if you haven’t done so already