1 of 32

Chi – square

Dr. Anshul Singh Thapa

2 of 32

An Introduction

  • The Chi – square is one of the simplest and most widely used non – parametric test in the statistical work.
  • The Chi – square describes the magnitude of the discrepancy between the theory and observation.
  • The Chi – square test was first used by Karl Pearson in the year 1990. it is defined as :
    • Chi – square (X2) = Σ [(O – E)2 /E].
  • Where O refers to the observed frequencies and E refers to the expected frequencies.
  • The chi – square test is used when the scores are on a nominal scale (that is, a variable with values that are categories).
  • The basic idea of chi – square test is that we compare how well observed frequency of people over various categories fits some expected frequencies.

3 of 32

Chi – square performs two types of functions:

Goodness of fit:

A common use is to assess whether a measured/observed set of measures follows an expected pattern. The expected frequency may be determined from prior knowledge (such as previous year’s exam results) or by calculation of an average from the given data.

Chi – square test for goodness of fit, level of single nominal variable.

Measure of Independence:

The chi-square test can be used in the reverse manner to goodness of fit. If the two sets of measure are compared, then we can determine whether it align or do not align.

Chi – Square test for independence is used when there are two nominal variables, each with several category

4 of 32

Steps to determine Chi – square

  • Write the observed frequency in column O.
  • Figure the expected frequency and write them in column E.

Expected Frequencies:

expected frequency for chi-square can be find in three ways:

    • When we hypothesized that all the frequencies are equal in each category. In such cases the expected frequency is calculated by dividing the sample by the numbers of categories.
    • We can also determine expected frequencies on the basis of some prior knowledge.

Goodness of fit test

5 of 32

Calculate the expected frequencies. In general the expected frequency for any call can be calculated from the following equation:

E = RT X CT/ N

E = Expected Frequency

RT = The row total for the row containing the cell

CT = The column total for the column containing the cell

N = The total number of observation

  • Take the difference between observed and expected frequencies and obtain the square of these differences. i.e., obtain the values of (O – E)2
  • Divide the values of (O – E)2 obtain in the second step by the respective expected frequency and obtain the total Σ [(O – E)2 /E].
  • This gives the value of Chi – Square which can range from zero to infinity. If Chi – square is zero it means the observed and expected frequencies completely coincide. The greater the discrepancy between the observed and expected frequencies, the greater shall be the value of Chi – square

Measure of independence

6 of 32

Steps of Hypothesis Testing

  • Formulate the research question as research hypothesis or null hypothesis.
  • Determine the characteristics of the comparison distribution
    • Chi – square distribution
    • Degree of freedom (df = N Categories – 1 )
  • Determine the cut off on the comparison at which the null hypothesis should be rejected
    • Level of significance
    • Degree of freedom
    • Table value
  • Determine your sample’s score on the comparison distribution
    • Determine the actual, observed frequency in each category
    • Determine the expected frequency in each category
    • Take observed minus expected frequency
    • Square them of these differences
    • Divide each squared difference by the expected frequency add up the result for all the categories
  • Decide whether to reject or accept the null hypothesis

7 of 32

8 of 32

  • The calculated value of Chi – square is compared with the table value of Chi – square for given degree of freedom at certain level of significance. If at the stated level (generally 5% level is selected), the calculated value of Chi – square is more than the table value of Chi – square, the difference between the theory and observation is considered to be significant and we do not reject (accept) the alternate Hypothesis and reject Null Hypothesis.
  • If the calculated Chi – square value is less than the table value than the difference between the theory and observation is not considered as significant, i.e., it is regarded as due to fluctuation in sampling and we do reject the alternate Hypothesis and do not reject (accept) Null Hypothesis. .

9 of 32

Data of Mental Health Disorder due to stress of the Youth of US region (1000)

CATEGORIES

Condition

O

Anxiety disorder

134

Alcohol and drug abuse

160

Mood disorder

97

Schizophrenia

12

None of these condition

597

(X2) = Σ [(O – E)2 /E].

10 of 32

Data of Mental Health Disorder due to stress of the Youth of US region (1000)

CATEGORIES

Condition

O

E

Anxiety disorder

134

146

Alcohol and drug abuse

160

164

Mood disorder

97

83

Schizophrenia

12

15

None of these condition

597

592

(X2) = Σ [(O – E)2 /E].

11 of 32

Data of Mental Health Disorder due to stress of the Youth of US region (1000)

CATEGORIES

Condition

O

E

(O - E)

Anxiety disorder

134

146

-12

Alcohol and drug abuse

160

164

-4

Mood disorder

97

83

14

Schizophrenia

12

15

-3

None of these condition

597

592

5

(X2) = Σ [(O – E)2 /E].

12 of 32

Data of Mental Health Disorder due to stress of the Youth of US region (1000)

CATEGORIES

Condition

O

E

(O - E)

(O - E)2

Anxiety disorder

134

146

-12

144

Alcohol and drug abuse

160

164

-4

16

Mood disorder

97

83

14

196

Schizophrenia

12

15

-3

9

None of these condition

597

592

5

25

(X2) = Σ [(O – E)2 /E].

13 of 32

Data of Mental Health Disorder due to stress of the Youth of US region (1000)

CATEGORIES

Condition

O

E

(O - E)

(O - E)2

(O - E)2 / E

Anxiety disorder

134

146

-12

144

.99

Alcohol and drug abuse

160

164

-4

16

.10

Mood disorder

97

83

14

196

2.36

Schizophrenia

12

15

-3

9

.60

None of these condition

597

592

5

25

.40

(X2) = Σ [(O – E)2 /E].

14 of 32

Data of Mental Health Disorder due to stress of the Youth of US region (1000)

CATEGORIES

Condition

O

E

(O - E)

(O - E)2

(O - E)2 / E

Anxiety disorder

134

146

-12

144

.99

Alcohol and drug abuse

160

164

-4

16

.10

Mood disorder

97

83

14

196

2.36

Schizophrenia

12

15

-3

9

.60

None of these condition

597

592

5

25

.40

= 4.09

(X2) = Σ [(O – E)2 /E].

15 of 32

Steps of Hypothesis Testing

  • Formulate the research question as research hypothesis or null hypothesis.
    • Ho = Population with stress have same mental health as per the theory
    • Ha = Population with stress have different mental health as per the theory
  • Determine the characteristics of the comparison distribution
    • Chi – square distribution
    • Degree of freedom (df = N Categories – 1 ) (df = 5 – 1 = 4)
  • Determine the cut off on the comparison at which the null hypothesis should be rejected
    • Level of significance = (0.05)
    • Degree of freedom = (df = 5 – 1 = 4)
    • Table value = 9.487
  • Determine your sample’s score on the comparison distribution
    • Determine the actual, observed frequency in each category
    • Determine the expected frequency in each category
    • Take observed minus expected frequency
    • Square them of these differences
    • Divide each squared difference by the expected frequency add up the result for all the categories = 4.09
  • Decide whether to reject or accept the null hypothesis
    • Table value at 0.05, df 4 (9.487) > Calculated Value (4.09)

16 of 32

17 of 32

Calculate the expected frequencies. In general the expected frequency for any call can be calculated from the following equation:

E = RT X CT/ N

E = Expected Frequency

RT = The row total for the row containing the cell

CT = The column total for the column containing the cell

N = The total number of observation

  • Take the difference between observed and expected frequencies and obtain the square of these differences. i.e., obtain the values of (O – E)2
  • Divide the values of (O – E)2 obtain in the second step by the respective expected frequency and obtain the total Σ [(O – E)2 /E].
  • This gives the value of Chi – Square which can range from zero to infinity. If Chi – square is zero it means the observed and expected frequencies completely coincide. The greater the discrepancy between the observed and expected frequencies, the greater shall be the value of Chi – square

Measure of independence

18 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

No Covishield

Total

19 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

812

No Covishield

Total

20 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

752

812

No Covishield

Total

21 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

752

812

No Covishield

Total

240

22 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

752

812

No Covishield

180

Total

240

23 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

752

812

No Covishield

180

2436

Total

240

24 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

752

812

No Covishield

180

2436

Total

240

3008

25 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812

No Covishield

220

2216

2436

Total

240

3008

3248

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

752

812

No Covishield

180

2256

2436

Total

240

3008

26 of 32

In a COVID-19 vaccination campaign in a certain area, Covishield was administered to 812 persons out of a total population of 3,248. the number of fever cases is shown below:

Treatment

Fever

No fever

Total

Covishield

20

792

812 (25 %)

No Covishield

220

2216

2436 (75 %)

Total

240

3008

3248 (100 %)

let us take the hypothesis that Covishield is not effective in checking COVID-19

Applying Chi – Square:

  • Find out Expected frequency by applying the formula = E = RT X CT/ N = 240 X 812 / 3248 = 60
  • Expected frequency corresponding to the first row and column is 60. therefore the table for expected frequency shall be:

Treatment

Fever

No fever

Total

Covishield

60

752

812

No Covishield

180

2256

2436

Total

240

3008

3248

27 of 32

O

E

Fever

and Covishield

20

60

Fever

and No Covishield

220

180

No Fever

and Covishield

792

752

No Fever

and No Covishield

2216

2256

28 of 32

O

E

(O – E)2

Fever

and Covishield

20

60

1600

Fever

and No Covishield

220

180

1600

No Fever

and Covishield

792

752

1600

No Fever

and No Covishield

2216

2256

1600

29 of 32

O

E

(O – E)2

(O – E)2 / E

Fever

and Covishield

20

60

1600

26.667

Fever

and No Covishield

220

180

1600

8.889

No Fever

and Covishield

792

752

1600

2.128

No Fever

and No Covishield

2216

2256

1600

0.709

30 of 32

O

E

(O – E)2

(O – E)2 / E

Fever

and Covishield

20

60

1600

26.667

Fever

and No Covishield

220

180

1600

8.889

No Fever

and Covishield

792

752

1600

2.128

No Fever

and No Covishield

2216

2256

1600

0.709

Σ [(O – E)2 /E] = 38.393

31 of 32

�Degree of freedom (v) = (r – 1) (c – 1) = (2 – 1) (2 – 1) = 1�v = 1�level of significance = 0.05 �the table valve of Chi – Square = 3.84�The calculated value of Chi – Square is greater than the table value. The hypothesis is rejected. Hence Covishield is useful in checking COVID-19�

O

E

(O – E)2

(O – E)2 / E

Fever

and Covishield

20

60

1600

26.667

Fever

and No Covishield

220

180

1600

8.889

No Fever

and Covishield

792

752

1600

2.128

No Fever

and No Covishield

2216

2256

1600

0.709

Σ [(O – E)2 /E] = 38.393

32 of 32