1 of 37

Unit 2 -

Data Representation and Analysis

Frequently Asked Questions

1

2 of 37

2

Hi! I’m Kira!

How to use this FAQ interactive document.

  • Open the slides in “Slideshow”
  • Click on the bubble with white backgrounds to jump to your desired content.
  • Click on Kira any to go back to the last menu
  • Use Ctrl + F to find keywords

3 of 37

What type of question do you have?

3

I need help with representing data

I need help with data conversion

I need help understanding a concept

I need help with keeping my data clean

4 of 37

Which concept can we help you with?

4

RGB Color Coding

What is a Data?

Hardware VS. Software

Code: Map Plotting

5 of 37

What is Data?

  • The term data represents the information that can be stored and edited by a computer.
  • There are two types of data: Analog and digital.
  • Analog data: Data that humans can read. This data needs to be translated into a digital form to be used by a computer.
    • Examples: paper copies, human speech, or the view around you.
  • Digital data: Data that can be read by a computer. This data is encoded and stored as a sequence of 0s and 1s (i.e., bits)

5

6 of 37

RGB Color Coding

Also known as Red-Green-Blue colors. Each pixel is made up of these three components and the code given each color tells the pixel how much of each color to show.

  • Utilizes Hex Code and is written #rrggbb
  • The maximum value for each color component is 255
  • #000000 is the code for black
  • #FFFFFF is the code for white
  • References:
    • Color Options
    • Color Names

6

7 of 37

Hardware VS. Software

Hardware

  • The physical components of a computer
  • Examples: Keyboard, mouse, monitor, printer etc.

Software

  • The digital components of a computer
  • Examples: Google Chrome, Webmail, paint, file explorer, the operating system, etc.

7

8 of 37

I need help with data conversion.

8

What is Binary

Bits and Bytes

Representing Characters with Binary

(ASCII Code)

Representing Colors in Binary

(RGB)

Representing Numbers with Binary

Encoding Vs. Decoding

Hex Code

QR Codes

9 of 37

Encoding VS. Decoding

Encoding

  • Translating from Analog to Digital Data for a computer to read
  • Called encryption when using it to hide secure data before transmission and storage

Decoding

  • Translating from Digital to Analog Data for a human to read
  • Called decryption when using it to reveal secure data after transmission and storage

9

10 of 37

What is Binary?

  • The system that utilizes a series of 1s and 0s to provide instructions to a computer.
  • The 1’s and 0’s in binary represent whether or not the circuit is open or closed.
  • 0 → off / open
  • 1 → on / closed

10

← This circuit is open

0

← This circuit is closed

1

11 of 37

Bits and Bytes

  • One binary digit is called a bit
  • Eight bits are called one byte
  • Bytes are the standard unit of measurement and is represented by the letter b
    • Kilobyte = 1,000 B= 1,000,000 bits
    • Megabyte = 1,000 KB = 1,000,000 B

11

0 → off

1 → on

12 of 37

Representing Characters with Binary (ASCII Code)

  • Used for all typed characters except numbers.
  • The chart provided utilizes 1 byte of data per character.
    • The chart we use adds one bit per character to the standard representation to make a total of 8 bits.

12

13 of 37

Representing Numbers with Binary

  • Binary is a base two number system
  • The lowest value bit is the one to the farthest right.
  • Each bit is twice the value of the bit before it.

13

Value of each bit

128

64

32

16

8

4

2

1

X

X

X

X

X

X

X

X

Binary Number

1

0

1

0

1

0

0

1

128

+

0

+

32

+

0

+

8

+

0

+

0

+

1

=

169

14 of 37

Representing Colors in Binary (RGB)

  • Tells the computer how much red, green, and blue should be used in each pixel.
  • Is written rrggbb where R represents red, G represents green, and B represents blue.
  • Using two hex code digits each color is a byte of information. The first hex digit of the pair is the first four bits and the last hex digit is the last four bits.

14

More about Hex Code

15 of 37

Hex Code

  • Always starts with a # then the code
  • Has 16 possible options
  • Two hex codes together represents up to the number 255
    • When 0 is counted there are 256 possible combinations

15

Hex

Binary

Hex

Binary

Hex

Binary

Hex

Binary

0

0000

5

0101

4

0100

C

1100

1

0001

6

0110

9

1001

D

1101

2

0010

7

0111

A

1010

E

1110

3

0011

8

1000

B

1011

F

1111

16 of 37

QR Codes

  • The boxes in the top left, top right, and bottom left are used to position the box.
  • An orientation box is in the lower right quadrant
  • Black and White are used because they are high contrast colors.
    • Some QR codes are different colors, but they may not be as easily read.
  • The pixels in the QR code are in binary.
  • A timing row and column use alternating dark and light squares to indicate the size of the QR code.

16

Parts of a QR Code

17 of 37

Parts of a QR Code

17

18 of 37

Code: Map Plotting

18

Import Map Plot (Grid)

Data Parameter

Plot on Map

Show on display

19 of 37

Import Mat Plot (Grid)

  • import → Bring data in
  • matplotlib.pyplot → creates a 2d rendering of the parameter stated
  • as plt →tells the computer to use the code in the data parameter to set the values for each row and column intersection of plt module*

19

import matplotlib.pyplot as plt

data = [[0, 1, 0, 1, 0, 0, 1, 1]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 0, 0, 0, 1, 1]

[0, 1, 1, 1, 0, 0, 1, 0]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 1, 0, 1, 0, 0]

[0, 0, 1, 0, 0, 0, 0, 1]

[0, 0, 1, 0, 0, 0, 0, 1]]

plt.imshow(data, cmap="gray")

plt.show()

* The plt module is a part of the base code for the visualizer and not covered in the MS course.

data parameter

20 of 37

Data Parameter

  • This code allows the use of binary code to encode a grid
  • Use the colors ‘clear’ and ‘black’
  • Each column is separated by , (comma)
  • Each row is separated by [ ] (opening and closing brackets
  • The data parameter encloses all the data in [ ] (opening and closing brackets

20

import matplotlib.pyplot as plt

data = [[0, 1, 0, 1, 0, 0, 1, 1]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 0, 0, 0, 1, 1]

[0, 1, 1, 1, 0, 0, 1, 0]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 1, 0, 1, 0, 0]

[0, 0, 1, 0, 0, 0, 0, 1]

[0, 0, 1, 0, 0, 0, 0, 1]]

plt.imshow(data, cmap="gray")

plt.show()

21 of 37

Plot on Map

  • plt.impshow() → refers to the plt module from the import line and sets the stated parameter in a 2D array
    • data → sets the imshow parameter
    • cmap=”gray” → sets the display to a grayscale image

21

import matplotlib.pyplot as plt

data = [[0, 1, 0, 1, 0, 0, 1, 1]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 0, 0, 0, 1, 1]

[0, 1, 1, 1, 0, 0, 1, 0]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 1, 0, 1, 0, 0]

[0, 0, 1, 0, 0, 0, 0, 1]

[0, 0, 1, 0, 0, 0, 0, 1]]

plt.imshow(data, cmap="gray") plt.show()

data parameter

22 of 37

Show on display

  • plt.show() → prints the plt.imshow code on the display

22

import matplotlib.pyplot as plt

data = [[0, 1, 0, 1, 0, 0, 1, 1]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 0, 0, 0, 1, 1]

[0, 1, 1, 1, 0, 0, 1, 0]

[0, 1, 1, 0, 0, 1, 0, 1]

[0, 1, 1, 1, 0, 1, 0, 0]

[0, 0, 1, 0, 0, 0, 0, 1]

[0, 0, 1, 0, 0, 0, 0, 1]]

plt.imshow(data, cmap="gray") plt.show()

plt.imshow

23 of 37

I need help with representing data.

23

Correlation VS. Causation

Parts of a Table

Scatter Plot

Line Graph

Bar and Column Graphs

Pie Chart

Trend

24 of 37

Parts of a Table

24

Small Travel Mugs

Large Travel Mugs

2019

4,296

5,537

2020

4,025

4,929

2021

3,458

5,273

2022

3,642

6,009

2023

4,200

5,728

2024

4,182

7,109

Header Row

Data Labels

Data

25 of 37

Scatter Plot

25

This chart uses dots to show two or more sets of data to see how closely they are correlated.

26 of 37

Line Graph

26

Line graphs look at changes over a period of time.

27 of 37

Bar and Column Graphs

27

Shows data using solid bars. Bar graphs go left to right and Column graphs go up and down.

Column Graph →

← Bar Graph

28 of 37

Pie Chart

28

This type of chart shows data as a part of the whole.

29 of 37

Correlation VS. Causation

Correlation

  • How two pieces of data are related.
  • The higher the degree of correlation, the greater the dependency of the two pieces of data.

Causation

  • When to pieces of data are connected so that changing one changes the other.
  • The data has a cause and effect relationship.

29

30 of 37

What is a Trend?

30

Trends are used to help predict future values. It tends to be represented on a graph by a straight line.

31 of 37

I need help with keeping my data clean.

31

Data Cleaning

Data Formatting

Data Bias

Noise and Outliers

What is Data Messiness?

32 of 37

What is Data Messiness?

When the data you are collecting is formatted in multiple ways, shows bias in the question format, is incorrect, or has unnecessary information.

32

Check out the information on how to clean up the messiness through Data Cleaning.

33 of 37

Data Cleaning

This process reviews the data collected to ensure that bias, formatting errors, data noise, and outliers.

33

Data Formatting

Data Bias

Noise and Outliers

34 of 37

Data Bias

Is an inaccurate or distorted representation of what is being measured.

34

Before gathering data, review your own questions to ensure you are not swaying your participants or data gathering.

Data bias can occur due to

  • The way a question is worded
  • The way the options are worded
  • Analyzing the data with a preconceived result

35 of 37

Data Formatting

When recording the data for analysis be sure to keep a uniform format for your data. If you are looking for a correlation remember to plan out what you need and how it should be represented.

35

Chocolate

Vanilla

Mint Chocolate

Strawberry

Coffee

Chocolate

11

9

7

8

3

White

13

15

3

18

9

Yellow

15

10

5

9

0

Marble

1

2

2

5

8

Red Velvet

2

5

0

0

1

Strawberry

1

4

1

8

0

36 of 37

Noise and Outliers

This type of data causes distraction from the correlated data and causation data.

36

Noise

  • Extra information that is not relevant to the questions being asked.
  • Data with a low correlation

Outliers

  • Data that is far outside the typical range
  • Data that is outside the normal range of behavior

37 of 37

We hope you keep learning and finding ways to use your new computational thinking!

37

Thank you!