1 of 33

DATA PRESENTATION

2 of 33

Numerical Summaries of Data

  • Data are the numeric observations of a phenomenon of interest. The totality of all observations is a population. A portion used for analysis is a random sample.
  • We gain an understanding of this collection, possibly massive, by describing it numerically and graphically, usually with the sample data.
  • We describe the collection in terms of shape, outliers, center, and spread (SOCS).
  • The center is measured by the mean.
  • The spread is measured by the variance.

3 of 33

Populations & Samples

  • A population is described, in part,

by its parameters, i.e., mean (μ)

and standard deviation (σ).

A random sample of size n is drawn from

a population and is described, in part,

by its statistics, i.e., mean (x-bar) and standard deviation (s). The statistics are used to estimate the parameters.

4 of 33

What is this “n–1”?

  • The population variance is calculated with N, the population size. Why isn’t the sample variance calculated with n, the sample size?
  • The true variance is based on data deviations from the true mean, μ.
  • The sample calculation is based on the data deviations from x-bar, not μ. X-bar is an estimator of μ; close but not the same. So the n-1 divisor is used to compensate for the error in the mean estimation.

5 of 33

Sample Range

If the n observations in a sample are denoted by x1, x2, …, xn, the sample range is:

r = max(xi) – min(xi)

It is the largest observation in the sample less the smallest observation.

From Example 6-3:

r = 13.6 – 12.3 = 1.30

Note that: population range ≥ sample range

6 of 33

Types of Presentation Methods of Data which are variables:�

4 Types:

1- Numerical.

2- Graphical.

3- Mathematical Methods.

4- Statistical Presentation of Data.

7 of 33

1- Numerical Data Presentation

  • Ordered

- Tubular : Frequency Tally Table (Simple or Cumulative).

** Both can be done whether data is Quantitative or Qualitative.

8 of 33

9 of 33

2- Graphical Data Presentation

** If One variable:

  • Bar (column chart): categorical or discrete.
  • Pie chart: categorical or discrete.
  • Histogram: continuous.
  • Line graph: continuous.
  • Frequency Polygon: continuous.
  • Stem & Leaf Plot: discrete.
  • Dot Plot : discrete.
  • Box plot : numerical data.

** If Two variable:

  • Clustered bar charts
  • Scatter diagram

9

10 of 33

Bar (column chart)

  • Bar or column chart-a separate horizontal or vertical bar is drawn for each category, its length being proportional to the frequency of event in that category.

  • The bars are separated by small gaps to indicate that the data are categorical or discrete.

  • Bar chart can be presented either horizontal or vertical.

11 of 33

12 of 33

Pie chart

  • Pie chart a circular 'pie' is split into sections, one for each category, so that the area of each section is proportional to the frequency in that category

  • It is often more difficult to display continuous numerical data, as the data may need to be summarized before being drawn.
  • Pie chart: categorical or discrete.

12

13 of 33

13

14 of 33

Histogram

  • Histogram is similar to a bar chart, but there should be no gaps between the bars as the data are continuous.
  • The width of each bar of the histogram relates to a range of values for the variable.
  • The area of the bar is proportional to the frequency in that range.
  • The histogram should be labeled carefully, to make it clear where the boundaries lie.

14

15 of 33

15

16 of 33

Frequency polygons

It is like histogram, a graph of a frequency distribution. We mark the number of observation within an interval with a single point placed at the midpoint of the interval, and then connect each set of points with a straight line.

The data are continuous.

16

17 of 33

Frequency polygons :�Age of 302 deaths from scarlet fever: a frequency polygon

17

18 of 33

Stem-and-Leaf Diagrams

  • Dot diagrams (dotplots) are useful for small data sets. Stem & leaf diagrams are better for large sets.
  • Steps to construct a stem-and-leaf diagram:
    1. Divide each number (xi) into two parts: a stem, consisting of the leading digits, and a leaf, consisting of the remaining digit.
    2. List the stem values in a vertical column (no skips).
    3. Record the leaf for each observation beside its stem.
    4. Write the units for the stems and leaves on the display.

19 of 33

Example : Alloy Strength

Figure 6-4 Stem-and-leaf diagram for Table 6-2 data. Center is about 155 and most data is between 110 and 200. Leaves are unordered.

20 of 33

Split Stems

  • The purpose of the stem-and-leaf is to describe the data distribution graphically.
  • If the data are too clustered, we can split and have multiple stems, thereby increasing the number of stems.
    • Split 2 for 1:
      • Lower stem for leaves 0, 1, 2, 3, 4
      • Upper stem for leaves 5, 6, 7, 8, 9
    • Split 5 for 1:
      • 1st stem for leaves 0, 1
      • 2nd stem for leaves 2, 3
      • 3rd stem for leaves 4, 5
      • 4th stem for leaves 6, 7
      • 5th stem for leaves 8, 9

21 of 33

Example 6-5: Chemical Yield Displays

Figure 6-5 (a) Stems not split; too compact

(b) Stems split 2-for-1; nice shape

(c) Stems split 5-for-1; too spread out

22 of 33

Box plot (box& whisker plot)

  • This is a vertical or horizontal rectangle, with the ends of the rectangle corresponding to the upper and lower quartiles of the data values.

  • A line drawn through the rectangle corresponds to the median value.

  • Tendency toward the skewness indicated by the median line is not centered in the box.

22

23 of 33

Box plot (box& whisker plot)

  • Whiskers, starting at the ends of the rectangle, usually indicate minimum and maximum values but sometimes relate to particular percentiles, e.g. the 5th and 95th percentiles.

  • Outliers may be marked.
  • Used commonly in numerical data.
  • But also used as function of bar chart ( discrete – categorical).
  • Summary : used in all types of data.

23

24 of 33

Box plots provide basic information about a distribution. For example, a distribution with a positive skew would have a longer whisker in the positive direction than in the negative direction. A larger mean than median would also indicate a positive skew. Box plots are good at portraying extreme values and are especially good at showing differences between distributions.

Boxplots are used to compare multiple different series of data. It's perfectly valid to create a boxplot for one series, although a histogram might give a more complex or complete picture of the data.

A boxplot can serve a similar function as a bar graph ( discrete – categorical).

24

25 of 33

Quartiles and Percentiles

  • The three quartiles partition the data into four equally sized counts or segments.
    • 25% of the data is less than q1.
    • 50% of the data is less than q2, the median.
    • 75% of the data is less than q3.

26 of 33

From Ungrouped Data:

Find the quartiles of the following numbers.

10, 72, 18, 45, 32, 56, 64, 27, 60

Solution: Arranging the numbers in ascending order of magnitude, we get

10, 18, 27, 32, 45, 56, 60, 64, 72

Here,

27 of 33

From ungrouped Frequency Data

28 of 33

29 of 33

30 of 33

From Frequency distribution with class interval:

31 of 33

32 of 33

Percentiles

  • The values which divide the arranged data intohundred (100) equal parts are called percentiles.
  • These values are denoted by
  • P1=First Percentile
  • P2=Second Percentile
  • ….................................…
  • P99=99 th Percentile

33 of 33